Random Forest

Second workhorse of modern ML. Just quick notes, and then implementation. Basically, what is really needed is to understand the decision tree algorithm, both the principles of GBM and Random Forest then becomes quite straight forward.

Pseudo algorithm

  • Bootstrap M sets of data (M = number of trees)
  • For each set, fit a decision tree. At each node in the tree, features considered for splitting are randomly selected
  • Predictions are then made by averaging the output from the trees, alternatively taking the mode of the set (in classification)

In my experiments with random forests, bagging is used in tandem with random feature selection. Each new training set is drawn, with replacement, from the original training set. Then a tree is grown on the new training set using random feature selection.

Breiman 2001

More …

Gradient Boosting

The workhorse of modern machine learning. After falling down the rabbit hole of understanding the construction of regression trees (and creating my own implementation), I am now ready to plug it into a gradient boosting machine.

Below, my notes on the algorithm, but first some links to material that does a much better job of explaining it here and here.

More …

Decision Trees (CART)

With some time on my hand, I decided I wanted to dig deeper into some of the workhorse algorithms of modern machine learning, namely gradient boosting machines and random forest.

Starting with GBM (and AdaBoost), the basic intuition behind the algorithm turned out to be almost embarassingly obvious (will return to that in another post), but the devil is really in the details.

What I had a hard time understanding was rather how a decision tree could be used to predict a continuous target variable, as well as for how the breakpoints would be set for continuous predictor variables.

Thus began some digging and experimenting.

More …

Test-Driven Development with Python

My notes and reflections from reading Test-Driven Development with Python by Harry J.W. Percival and coding along (code found here).

TLDR: I highly recommend the book as an introduction to TDD in Python, more specifically using the Django framework. Pushing the reader to actually setup a staging server together with deployment scripts and a Jenkins server for continuous testing ensures that you get an end-to-end hands-on basic understanding of the process.

More …

Scrum and XP from the Trenches

Reflections from rereading Henrik Knibergs book Scrum and XP from the Trenches with five more years of work life experience than I had last time I read it, and now from a Product Owner perspective rather than a developer perspective.

A quick video summary of the Product Owner role (worth watching by itself) can also be found on Youtube:

TLDR: Definitely worthwhile, reminded me of some things that has fallen to the side over the years of practicing (missing the forest for all the trees), and gave me some new ideas as well.

More …