I recently listened to a really interesting talk by Jonathan Whitmore where he discussed the approach his company has to working with data using the Jupyter Notebook. I’d recommend watching it, but I’ve made a brief summary below for my own future reference.
Jupyter
I found an excellent tutorial series on Machine Learning on the Google Developers YouTube channel this weekend. It uses Python, scikit-learn and tensorflow and covers decision trees and k-nearest neighbours (KNN).
I really liked the focus on understanding what was going on underneath the hood. I followed along and implemented KNN from scratch and expanded on the base class they described to include the ability to include k as a variable. You can find my implementation in a Jupyter Notebook here.
Sidenote: If you want to visualise the decision tree, you’ll need to install the following libraries. I used homebrew to install graphviz but you could also use a package manger on Linux:
brew install graphviz pip3 install pydotplus
I have a talk today about how I’ve been using the iPython Notebook when prototyping different image processing techniques and their parameters. The presentation involved a bit of background on how to use the notebook and markdown, before I demonstrated how you could use the notebook to rapidly test out the best way to process the images I’ve been working on. In this case the images were taken from infrared footage of bees, and I wanted to separate the tags out from the background. I found simple thresholding worked well at first while the tags were very reflective, but once they became a bit more dull, that ceased to work effectively.
You can find the notebook I used for my talk here, GitHub renders the notebooks in the browser. I have cleared all the output images the file generated because the file was getting a bit large to view online.
One of the best things about the iPython notebook is the number of easy-to-follow tutorials it has inspired. I thought I’d share a few that I’ve found on machine learning and statistics.
- Python for Developers – great resource for those wanting to learn and/or deepen their understanding of Python.
- Machine Learning with scikit-learn – provides a good introduction and background to machine learning.
- Machine learning with Python – covers regression, neural networks, decision trees.
- Machine Learning with Python – covers PCA, k-means clustering, k-nearest neighbours.
- Learn Data Science with Python – covers regression, random forests, k-means clustering.
- Probabilistic Programming & Bayesian Methods for Hackers – covers Bayesian methods including Markov Chain Monte Carlo.
- Bayesian data analysis – covers how probabilistic programming works.
- Supervised Learning SVM – covers Support Vector Machines (SVM)
- Face Recognition– covers PCA, and SVM.
- Particle Filter – covers the identification and tracking of objects in a video.
I’ll continue to update the list as I find new notebooks I find handy.