Materials for Learning Machine Learning

Lately, myself and a friend have become rather interested in learning more about machine learning. I’ve been trying to start a collection of learning materials that I either found useful or mean to go through at some point, and thought I’d write a post about it. I’m hoping this can help people who are just starting out and those who know more than me could comment and point me towards some additional material I can link to and look into myself.

**Update** Had a great response on Reddit, thanks so much for all the suggestions that have flooded in! I’ve added quite a few new links that looked good, and am currently reviewing some other materials I’ll link to soon. This list will continue to evolve with links to new content as I find them (or am alerted by others). Follow me on Twitter for updates and new posts on machine learning, research and statistics.

Teaching Materials


iPython Notebooks

R Markdown


Interesting Links

OpenCV Machine Learning



Amazon Web Services (AWS)

Read More
Interesting articles about teaching data science

I’ve been following some of the great data school tutorials and I recently read a couple of interesting posts about teaching programming and data science on their website:

The second article is a little longer, but well worth the read. When I first started trying to learn about machine learning, I thought R would be the natural choice. However, I’ve become a real fan of Python’s Scikit-Learn (Six reasons why I recommend scikit-learn) lately and have continued my learning in Python. I can definitely see why they made that decision with their teaching, and the ability to use the Anaconda Python installation to get up and running definitely makes a lot of sense.

Getting back to the original purpose of this post, I liked the teaching idea he discusses in the article of starting the class off with a motivating example – in their case they explored the iris dataset and used the K-nearest neighbours algorithm. I also thought it was a great idea to use data retrieval as a lesson opportunity rather than merely downloading a bunch of CSV files from a website.

Finally, one thing I think would have been handy would be for them to give a tour of Unix/Linux and Bash as part of the course materials, but then I guess you have to draw the content line somewhere when time is limited.

Read More
iPython Notebook Presentation

I have a talk today about how I’ve been using the iPython Notebook when prototyping different image processing techniques and their parameters. The presentation involved a bit of background on how to use the notebook and markdown, before I demonstrated how you could use the notebook to rapidly test out the best way to process the images I’ve been working on. In this case the images were taken from infrared footage of bees, and I wanted to separate the tags out from the background. I found simple thresholding worked well at first while the tags were very reflective, but once they became a bit more dull, that ceased to work effectively.

You can find the notebook I used for my talk here, GitHub renders the notebooks in the browser. I have cleared all the output images the file generated because the file was getting a bit large to view online.

Read More
Learning the Flask Python Web Framework

Flask is a minimalist web framework written in Python that I’ve become rather intrigued by, and have been been reading up on in my spare time. Part of the appeal of this framework is that all you need to get a simple server going is this:

from flask import *

app = Flask(__name__)

def home():
return ("Hello, World!")

if __name__ == '__main__':

Read More
How to use GitHub for version control

Lately I’ve become increasingly interested in using Git for version control of my programs. Git is a really good way to track changes made to your code either when you’re coding by yourself, or in a company with dozens of other developers. I’ve only really had a relatively superficial understanding of how Git worked before, and I tended to find a lot of the explanations online rather confusing. However, if you want to learn how to use Git, then I can thoroughly recommend this website, which will teach you how to use Github with an interactive terminal in the browser.

Once you get through those, then I would watch these two videos and then bookmark them for future reference.

**Update** I’ve found a few other resources for learning more about Git:

Read More
Review of OpenCV Essentials

Recently PACT sent me a free copy of a book on OpenCV because of my background writing about the topic. First things first, OpenCV Essentials is definitely not a book for an image processing/computer vision novice. If this sounds like you, then I’d recommend you refer to my post on resources for getting started with OpenCV here, and then come back to this book in a few months. However, if you’ve been using OpenCV and programming in C++ for at least a few months and want a resource which you can quickly refer to when you’re delving into some of the deeper functionality of OpenCV, then I can thoroughly recommend this book to you.

Read More
Writing idiomatic Python videos

I’ve been subscribed to Jeff Knupp’s Python programming blog for a while now, and have really enjoyed a lot of the content. He recently released a 3-part video series on writing idiomatic Python, which I highly recommend watching.

Read More
Parallel operations in R

I thought I’d start a list of some code examples I’ve found online which enable you to perform parallel operations in R and take advantage of multi-core processors.

I’ll try to add to this list from time-to-time when I come across new examples.

Read More
Renaming all files in a directory with Bash

In the past when I’ve needed to automate a task, such as renaming thousands of files, I’ve tended to use the Python OS module. However, lately I’ve started to write Bash scripts to achieve these tasks, because of how quickly and easily it allows me to work within the Unix filesystem. Last week, one of my programs produced thousands of images with random names. I decided that I needed to write a script to rename all the jpg files in a directory incrementing from 0:

Read More