Activation Functions

Just some notes for myself going through the Deep Learning specialisation on Coursera.

Tanh: Good for hidden layers, as output with mean 0 makes learning easier for next layer.

Sigmoid: Good for output layer in binary classification since 0-1 maps onto certainty 0 - 100%.

ReLU: Both previous functions can slow down gradient descent when the slope nears 0 for the derivative. In order to get around this, rectified linear unit (ReLU) are popular. This is the most common activation function.

Leaky ReLU: Version that avoids having the slope being zero for negative value of z. Has the same advantage as regular ReLU, not used as much in practice.

Comments