Honeybees and missing data part 2: Where do bees like to live?

This article relates to where bees prefer to build their hives. Back in the 1970s, there was a bunch of research conducted at Cornell surveying where bees build hives in the wild, and all the evidence seemed to indicate that honeybees preferred to build their hives relatively close to the ground. This finding seemed rather …

Read More
Bees, lasers, and machine learning

I originally wrote this article about how I’ve used machine learning as part of my research for the Data Skeptic blog. If you’re interested in machine learning, I can’t recommend their podcast enough. A couple of years ago I started my PhD at the Australian National University working to quantify honeybee behaviour. We wanted to …

Read More
Notes from ‘A Few Useful Things to Know about Machine Learning’

I was reading a paper by Pedro Domingos this evening which had some tips and advice for people using machine learning. I’ve written down some bullet points for my own reference and I hope someone else finds it useful. I know I’ve made some of the mistakes he gives advice about avoiding.

Read More
Deep Learning PyData Talk

Deep learning is a type of machine learning based on neural networks which were inspired by neurons in the brain. The difference between a deep neural network and a normal natural network is the number of ‘hidden layers’ between the input and output layers. I recently watched an excellent presentation on Deep Learning by Roelof Pieters titled …

Read More
Essential Libraries for Data Science on a Mac

I recently ran a fresh install on my Mac and thought I’d take the opportunity to document the libraries and programs I find incredibly useful. The Python libraries I’ll frequently pip3 install include:

Read More
Best practices for data science with the Jupyter Notebook

I recently listened to a really interesting talk by Jonathan Whitmore where he discussed the approach his company has to working with data using the Jupyter Notebook. I’d recommend watching it, but I’ve made a brief summary below for my own future reference.

Read More
Improving Model Accuracy

I wrote a few quick bullet points down from the article “8 Proven Ways for improving the “Accuracy” of a Machine Learning Model” for future reference. Improving Accuracy Add more data Fix missing values Continuous: impute with median/mean/mode Categorical: treat as separate class Predict missing classes with k-nearest neighbours Outliers Delete Bin Impute Treat as separate …

Read More
Working with Imbalanced Classes

I wrote a few quick bullet points down from the article “8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset” for future reference. Tactics Imbalanced datasets occur when you have a class that occurs much more infrequently than the others. If a model ignores the class, it can still achieve a high classification accuracy, …

Read More
Assessing machine learning algorithm performance

I wrote a few quick bullet points down from the article “How To Implement Machine Learning Algorithm Performance Metrics From Scratch With Python“. Metrics Classification accuracy Test how well predictions of a model do overall accuracy = correct predictions / total predictions Confusion matrix Use to identify how well your predictions did with different classes Very …

Read More
Machine Learning Recipes

I found an excellent tutorial series on Machine Learning on the Google Developers YouTube channel this weekend. It uses Python, scikit-learn and tensorflow and covers decision trees and k-nearest neighbours (KNN). I really liked the focus on understanding what was going on underneath the hood. I followed along and implemented KNN from scratch and expanded …

Read More