Notes from ‘A Few Useful Things to Know about Machine Learning’

by Jack Simpson January 10, 2017

written by Jack Simpson January 10, 2017

I was reading a paper by Pedro Domingos this evening which had some tips and advice for people using machine learning. I’ve written down some bullet points for my own reference and I hope someone else finds it useful. I know I’ve made some of the mistakes he gives advice about avoiding.

Overfitting
- Never forget that your ultimate is to generalise beyond the data
- Beginners will frequently make the mistake of testing on training data and think their model is a success
- Ensure that you set some data aside from the start to test your selected and tuned classifier
- Easy to contaminate your testing dataset by running frequent tests as you tune the hyperparameters of your model
- Using cross-validation you can test differently tuned classifiers on subsets of the data
Features
- Most important factor is the features you train your classifier on
- Typically you need to do some processing, as raw data frequently is in a format that is not immediately useful
- Most of your time will probably be spent focussed around cleaning data and feature engineering
More Data > Smarter Algorithms
- If the accuracy of your model is not adequate you can either change/modify the model or train it on more data
- More data is usually better, but can be time consuming to gather and clean
Understand multiple models
- Try to understand a variety models
- When tackling a problem, try to solve it with simpler learners:
  - naive bayes -> logistic regression -> k-nearest neighbour -> SVM
  - Fancy learners may be interesting but can increase the complexity and function as black boxes
- Often the best learner for a problem can vary based on the goals of the project and the data you have access to

Notes from ‘A Few Useful Things to Know about Machine Learning’

You may also like

Leave a Comment Cancel Reply