Freddie Karlbom · Random Tech Ramblings

Course Notes - ML Strategy Week 1

28 Feb 2018

The first week of Structuring Machine Learning Projects focuses on how to diagnose issues, and how to efficiently iterate by trying to implement changes that are precise enough to only affect one thing.

Topics include orthogonalization, selecting KPIs and Bayes optimal error.

More …

California Housing - Data Exploration

25 Feb 2018

Taking a lot of inspiration from this Kaggle kernel by Pedro Marcelino, I will go through roughly the same steps using the classic California Housing price dataset in order to practice using Seaborn and doing data exploration in Python.

Secondly, this notebook will be used as a proof of concept of generating markdown version using jupyter nbconvert --to markdown notebook.ipynb in order to be posted to my Jekyll blog.

More …

Learning Optimisation

11 Feb 2018

Learning Optimisation

Long post covering Exponentially Weighted Averages, Bias Correction, Gradient Descent with Momentum, RMSprop, Adam optimisation technique and Learning Rate Decay.

It covers part of the second week material of the Improving Deep Neural Networks Coursera course, and like other course notes posts it is mainly my notes from the lectures, rephrased in a language that makes sense to me and trying to answer the questions I got from the lectures.

One nice thing is that I get to write Latex equations way to seldom, and this post is full of them.

$v_i = \beta v_{i-1} + (1-\beta)\theta_i$

etc…

More …

Bias and Variance

29 Jan 2018

Terminology

Bias

The model has big constraints/assumptions as to how it should look, which means it ends up underfitting the actual data. An obvious example would be a linear model trying to fit clearly non-linear data.

Solutions

Test adding more hidden layers or units
Increase number of learning iterations / learning rate
Experiment with network architecture

Variance

The model is too free in adapting exactly to cover the training data, which means it ends up overfitting as some of the outlier data points probably won’t be good predictors for another data set.

Solutions

Train on more data
Regularisation
Experiment with network architecture

More …

Activation Functions

28 Jan 2018

Just some notes for myself going through the Deep Learning specialisation on Coursera.

Tanh: Good for hidden layers, as output with mean 0 makes learning easier for next layer.

Sigmoid: Good for output layer in binary classification since 0-1 maps onto certainty 0 - 100%.

ReLU: Both previous functions can slow down gradient descent when the slope nears 0 for the derivative. In order to get around this, rectified linear unit (ReLU) are popular. This is the most common activation function.

Leaky ReLU: Version that avoids having the slope being zero for negative value of z. Has the same advantage as regular ReLU, not used as much in practice.