CNN - Week 3

Localisation, detection and YOLO algorithm.

Localisation

Where in the image is the object? The network is trained to also output a bounding box with four values.

Landmark Detection

Detect key points/positions in an image, such as eye location. By having a labeled training set with this data, you can train your network to automatically find it. Used for image distortion and other effects in AR.

Object Detection

Basic sliding window detection is accurate but computationally expensive. Luckily, you can implement convolutional sliding windows by changing fully connected layers to instead be convolution layers, so that instead of having 400 output units in the fully connected layer, you have 400 filters of dimension matching the input sliding window size so that the output is a 1x1x400 convolution layer.

The benefit of this is that if the input image is larger than the convolution dimension, then it will just automatically create the additional sliding windows to cover the full image, i.e. all computations only have to be made once for shared filters between the windows.

Bounding Box Predictions

YOLO algorithm - You Only Look Once. Lays a grid over the image and runs a prediction for each cell in the grid.

Intersection Over Union - allows you to evaluate bounding box fit during training. A common convention is that IoU >= 0.5 is considered correct.

Non-max Suppression - Since an image might stretch over multiple cells, it’s possible that the same object gets detected multiple times for the various grids. Non-max suppression looks at the probabilities and takes the highest probability, then discards other detected objects with a high IoU. Next step is to move on to remaining detected object with highest likelihood and repeat.

In the end you have a number of objects that does not have a significant overlap.

Anchor Boxes - Supports detecting multiple objects in an image. Each cell has x anchor boxes of different sizes. For each detected object, assign it to the anchor box most similar to the predicted bounding box.

For cells with more than x objects, or where the objects are similar to the same anchor box, the algorithm doesn’t by default provide a tie breaker, but this should be edge cases.

Region Proposals - Oftentimes, parts of the image are clearly not of interest and thus it would be great if we quickly can detect that with a segmentation algorithm and then avoid spending computing time on those windows. R-CNN is a proposed solution to this problem, but it’s still quite slow so unsure how much of a win this is. Fast R-CNN and Faster R-CNN versions of the algorithm exists as well, but are still slower than the YOLO algorithm.

Comments