Bees, lasers, and machine learning

I originally wrote this article about how I’ve used machine learning as part of my research for the Data Skeptic blog. If you’re interested in machine learning, I can’t recommend their podcast enough. A couple of years ago I started my PhD at the Australian National University working to quantify honeybee behaviour. We wanted to …

Read More
Notes from ‘A Few Useful Things to Know about Machine Learning’

I was reading a paper by Pedro Domingos this evening which had some tips and advice for people using machine learning. I’ve written down some bullet points for my own reference and I hope someone else finds it useful. I know I’ve made some of the mistakes he gives advice about avoiding.

Read More
Deep Learning PyData Talk

Deep learning is a type of machine learning based on neural networks which were inspired by neurons in the brain. The difference between a deep neural network and a normal natural network is the number of ‘hidden layers’ between the input and output layers. I recently watched an excellent presentation on Deep Learning by Roelof Pieters titled …

Read More
Essential Libraries for Data Science on a Mac

I recently ran a fresh install on my Mac and thought I’d take the opportunity to document the libraries and programs I find incredibly useful. The Python libraries I’ll frequently pip3 install include:

Read More
Improving Model Accuracy

I wrote a few quick bullet points down from the article “8 Proven Ways for improving the “Accuracy” of a Machine Learning Model” for future reference. Improving Accuracy Add more data Fix missing values Continuous: impute with median/mean/mode Categorical: treat as separate class Predict missing classes with k-nearest neighbours Outliers Delete Bin Impute Treat as separate …

Read More
Working with Imbalanced Classes

I wrote a few quick bullet points down from the article “8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset” for future reference. Tactics Imbalanced datasets occur when you have a class that occurs much more infrequently than the others. If a model ignores the class, it can still achieve a high classification accuracy, …

Read More
Assessing machine learning algorithm performance

I wrote a few quick bullet points down from the article “How To Implement Machine Learning Algorithm Performance Metrics From Scratch With Python“. Metrics Classification accuracy Test how well predictions of a model do overall accuracy = correct predictions / total predictions Confusion matrix Use to identify how well your predictions did with different classes Very …

Read More
Machine Learning Recipes

I found an excellent tutorial series on Machine Learning on the Google Developers YouTube channel this weekend. It uses Python, scikit-learn and tensorflow and covers decision trees and k-nearest neighbours (KNN). I really liked the focus on understanding what was going on underneath the hood. I followed along and implemented KNN from scratch and expanded …

Read More
Visual Diagnostics for More Informed Machine Learning

I recently watched Rebecca Bilbro’s presentation at PyCon 2016 and thought I’d share a few of my short notes from her interesting presentation. Model Selection Triple When selecting a model, rather than going with your default favourite method, take 3 things into account: Feature analysis: intelligent feature selection and engineering Model selection: model that makes …

Read More
Python hmmlearn installation issues

I’ve recently started learning how to apply a Hidden Markov Model (HMM) to some states of honeybee behaviour in my data and have been trying to install Python’s hmmlearn library. Unfortunately I kept getting this frustrating error due to it being unable to locate NumPy headers: After a bit of searching I found the solution in a …

Read More