YOLO algorithm
Last updated on：12 days ago
YOLO (you only look once) is a fast detection algorithm, which is widely used in autonomous driving car.
YOLO
YOLO is an Image classification and localization algorithm. What the YOLO algorithm does is it takes the midpoint of each of the objects and then assigns the object to the grid cell containing the midpoint.
For each grid cell:
$$L (\hat{y}, y) = (\hat{y}_1  y_1)^2 + (\hat{y}_2  y_2)^2 + … + (\hat{y}_8  y_8)^2, if y_1 = 1
(\hat{y}_1  y_1)^2 if y_1 = 0$$
The object is only assigned to one of the grid cells. In practice maybe use 19\times 19 \times 8 grids.
Architectures
Advantages
 output precise bounding boxes
 output much more precise coordinates that are not just dictated by the stripe size of your sliding windows classifier
 this is a convolutional implementation and you only need to implement this algorithm once
 runs very fast, it works even for real time object detection
Specify the bounding boxes
Intersection over union (IoU)
Evaluating object localization. IoU is a measure of the overlap between two bounding boxes. IoU means the ratio of bounding box to ground truth.
Nonmax suppression
$p_c \le 0.6$
While there are any remaining boxes:

Pick the box with the largest $p_c$ output that as a prediction (just for car detection)

Discard any remaining box with IoU $\ge 0.5$ with the box output in the previous step
Outputting the nonmax supressed outputs
 For each grid call, get 2 predicted bounding boxes
 get rid of low probability predictions
 for each class (pedestrian, car, motorcycle) use nonmax suppression to generate final predictions
Q&A
6.Suppose you run nonmax suppression on the predicted boxes above. The parameters you use for nonmax suppression are that boxes with probability $le$ 0.4 are discarded, and the IoU threshold for deciding if two boxes overlap is 0.5. How many boxes will remain after nonmax suppression?
Answer: 5
Because the boundary of tree 0.74 and tree 0.46 did not overlap.
Anchor boxes
Previously:
Each object in training image is assigned to grid cell that contains that object’s midpoint (and anchor box for the grid cell with highest IoU)
$$3 \times 3 \times 16 = 3 \times 3 \times 2 \times 8$$
Region proposals (optional): RCNN
Region  convolutional neural network
segmentation algorithm
N 2000 block, quite slow
RCNN: propose regions. classify proposed regions one at a time. output label + bounding box
Fast RCNN: Propose regions. use convolutional implementation of sliding windows to classify al the proposed regions
Faster RCNN: use convolutional network to propose regions (slower that YOLO algorithm)
Reference
[1] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., 2016. You only look once: Unified, realtime object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779788).
[2] Deeplearning.ai, Convolutional Neural Networks
本博客所有文章除特别声明外，均采用 CC BYSA 4.0 协议 ，转载请注明出处！