28 Feb 2018
The first week of Structuring Machine Learning Projects focuses on how to diagnose issues, and how to efficiently iterate by trying to implement changes that are precise enough to only affect one thing.
Topics include orthogonalization, selecting KPIs and Bayes optimal error.
More …
25 Feb 2018
Taking a lot of inspiration from this Kaggle kernel by Pedro Marcelino, I will go through roughly the same steps using the classic California Housing price dataset in order to practice using Seaborn and doing data exploration in Python.
Secondly, this notebook will be used as a proof of concept of generating markdown version using jupyter nbconvert --to markdown notebook.ipynb
in order to be posted to my Jekyll blog.
More …
11 Feb 2018
Learning Optimisation
Long post covering Exponentially Weighted Averages, Bias Correction, Gradient Descent with Momentum, RMSprop, Adam optimisation technique and Learning Rate Decay.
It covers part of the second week material of the Improving Deep Neural Networks Coursera course, and like other course notes posts it is mainly my notes from the lectures, rephrased in a language that makes sense to me and trying to answer the questions I got from the lectures.
One nice thing is that I get to write Latex equations way to seldom, and this post is full of them.
etc…
More …
29 Jan 2018
Terminology
Bias
The model has big constraints/assumptions as to how it should look, which means it ends up underfitting the actual data. An obvious example would be a linear model trying to fit clearly non-linear data.
Solutions
- Test adding more hidden layers or units
- Increase number of learning iterations / learning rate
- Experiment with network architecture
Variance
The model is too free in adapting exactly to cover the training data, which means it ends up overfitting as some of the outlier data points probably won’t be good predictors for another data set.
Solutions
- Train on more data
- Regularisation
- Experiment with network architecture
More …
28 Jan 2018
Just some notes for myself going through the Deep Learning specialisation on Coursera.
Tanh: Good for hidden layers, as output with mean 0 makes learning easier for next layer.
Sigmoid: Good for output layer in binary classification since 0-1 maps onto certainty 0 - 100%.
ReLU: Both previous functions can slow down gradient descent when the slope nears 0 for the derivative. In order to get around this, rectified linear unit (ReLU) are popular. This is the most common activation function.
Leaky ReLU: Version that avoids having the slope being zero for negative value of z. Has the same advantage as regular ReLU, not used as much in practice.