Data scientist at Port Jackson Partners in Sydney, Australia. My PhD was in computational biology. In my spare time I write about medical research at BioSky.co.CVAbout
I’ve been following some of the great data school tutorials and I recently read a couple of interesting posts about teaching programming and data science on their website:
- Should you teach Python or R for data science?
- Lessons learned from teaching an 11-week data science course
The second article is a little longer, but well worth the read. When I first started trying to learn about machine learning, I thought R would be the natural choice. However, I’ve become a real fan of Python’s Scikit-Learn (Six reasons why I recommend scikit-learn) lately and have continued my learning in Python. I can definitely see why they made that decision with their teaching, and the ability to use the Anaconda Python installation to get up and running definitely makes a lot of sense.
Getting back to the original purpose of this post, I liked the teaching idea he discusses in the article of starting the class off with a motivating example – in their case they explored the iris dataset and used the K-nearest neighbours algorithm. I also thought it was a great idea to use data retrieval as a lesson opportunity rather than merely downloading a bunch of CSV files from a website.
Finally, one thing I think would have been handy would be for them to give a tour of Unix/Linux and Bash as part of the course materials, but then I guess you have to draw the content line somewhere when time is limited.