Week one notes from Convolutional Neural Networks.
Common Types of Layers
- Convolution (CONV)
- Pooling (POOL)
- Fully connected (FC)
Convolution
The convolution operation means applying a filter to a part of the input data to detect a low level feature. This filter is then reapplied across the input data to detect the same pattern wherever it occurs in the input data.
Multichannel images, i.e. colour images, means one convolution channel per channel. These filters can be the same or differ, and the output is the sum of the matrix outputs.
Basic Notation
If layer is a convolution layer:
Dimensions of the input features that should be included in the filter.
In order to avoid the output to shrink due to the filter, padding can be added to the input to ensure the output keeps the same dimensions. No padding added is called valid convolution, and when it’s added to retain the input dimensions it is called same convolution.
Adding a larger stride than one means we will subsample the total 2D-related input features.
Multiple convolutional layers are usually added in parallel so that more than a single feature can be learned. From this follows that the output will be a multichannel matrix with size given by the number of filters used.
Pooling
Max Pooling: Select max value in pooling area.
Parameters and . One benefit is that there are no weights, so no computation time is spent here.
For multi-channel, the pooling layer executes independently on each channel and generates an output channel.
Average Pooling: Much more seldom used, but sometimes in order to collapse each channel into a single value.
Fully Connected
Regular feed forward layer, like in previous courses…
General Suggestions
Since there are a lot of different hyper parameters, it’s usually good to look in literature for existing best practices rather than try to find them from scratch for every new problem.
Convolution layers are good early on in the net as it means parameter sharing for low level features (a trained convolutional edge detector will work just as well reused in all parts of the image), which means the number of weights the network needs to learn stays relatively small.
Each output unit is connected to only a subset of all the input units, which means the number of connections is kept down.
Another bonus is that the reuse of the feature detectors helps give translation invariance, i.e. an object can be moved within the image without it affecting the ability for the network to detect it, which helps reduce overfitting on training data.
Comments