Google have released a Python to Go transcompiler

Google have released an open source project on GitHub called Grumpy that converts Python to Go, and then compiles it down to native code. It’s an interesting development, but since they won’t be supporting C extension modules (which basically rules out all the scientific and machine learning libraries I use), it means I probably won’t end up using this …

Read More
Deep Learning PyData Talk

Deep learning is a type of machine learning based on neural networks which were inspired by neurons in the brain. The difference between a deep neural network and a normal natural network is the number of ‘hidden layers’ between the input and output layers. I recently watched an excellent presentation on Deep Learning by Roelof Pieters titled …

Read More
Essential Libraries for Data Science on a Mac

I recently ran a fresh install on my Mac and thought I’d take the opportunity to document the libraries and programs I find incredibly useful. The Python libraries I’ll frequently pip3 install include:

Read More
Excel confusing CSV file with SYLK file

I recently had an interesting experience whilst using pandas to write some data to a CSV file and then opening the file up with Excel to inspect its contents. To my surprise, I received a message from Excel informing me that I was attempting to open something called a ‘SYLK file’.

Read More
Removing webpage newline characters in Python

An issue I recently came across whilst using the Python requests module was that while I was trying to parse HTML text, I couldn’t remove the newline characters ‘ ‘ with strip().

Read More
Best practices for data science with the Jupyter Notebook

I recently listened to a really interesting talk by Jonathan Whitmore where he discussed the approach his company has to working with data using the Jupyter Notebook. I’d recommend watching it, but I’ve made a brief summary below for my own future reference.

Read More
Machine Learning Recipes

I found an excellent tutorial series on Machine Learning on the Google Developers YouTube channel this weekend. It uses Python, scikit-learn and tensorflow and covers decision trees and k-nearest neighbours (KNN). I really liked the focus on understanding what was going on underneath the hood. I followed along and implemented KNN from scratch and expanded …

Read More
Multiprocessing in Python

I frequently find myself working with large lists where I need to apply the same time-consuming function to each element in the list without concern for the order that these calculations are made. I’ve written a small class using Python’s multiprocessing module to help speed things up. It will accept a list, break it up …

Read More
Visual Diagnostics for More Informed Machine Learning

I recently watched Rebecca Bilbro’s presentation at PyCon 2016 and thought I’d share a few of my short notes from her interesting presentation. Model Selection Triple When selecting a model, rather than going with your default favourite method, take 3 things into account: Feature analysis: intelligent feature selection and engineering Model selection: model that makes …

Read More
Finding rows in dataframe with a 0 value using Pandas

Recently I needed to identify which of the rows in a CSV file contained 0 values. This was interesting because normally I tend to look at this problem within columns rather than rows. Pandas provides a neat solution to this which I’ll demonstrate below using this data as an example: This data frame should look …

Read More