There is a lot of literature in books and in the web around the details of machine learning algorithms, for example, on how to calculate the centroid of k-means, or the distance of k-NN, or the coefficients of linear regressions, however there isn’t a lot of material around why and how to pick the best algorithm for a particular use-case.
I find this to be a bit of an irony, as one of the goals of ML is to allow you to see the big picture, yet the procedures available today for selecting a ML technique focus on the tree, rather than on the forrest.
I tried to summarize some of these decisions in the following slides:
Further, here is a simple (and somewhat naive) flow chart describing the steps:
This is by no means complete, particularly the unsupervised models section, and really just an initial effort. All feedback is very welcomed, and will be considered.
Excellent!
Along the same lines – http://peekaboo-vision.blogspot.com/2013/01/machine-learning-cheat-sheet-for-scikit.html
Very good, I particularly liked the categorization of ‘looking for structure (clustering)’ in addition to ‘quantity (regression) and category (qualification)’. Thanks for sharing.