Notes from ‘A Few Useful Things to Know about Machine Learning’

I was reading a paper by Pedro Domingos this evening which had some tips and advice for people using machine learning. I’ve written down some bullet points for my own reference and I hope someone else finds it useful. I know I’ve made some of the mistakes he gives advice about avoiding.

  • Overfitting
    • Never forget that your ultimate is to generalise beyond the data
    • Beginners will frequently make the mistake of testing on training data and think their model is a success
    • Ensure that you set some data aside from the start to test your selected and tuned classifier
    • Easy to contaminate your testing dataset by running frequent tests as you tune the hyperparameters of your model
    • Using cross-validation you can test differently tuned classifiers on subsets of the data
  • Features
    • Most important factor is the features you train your classifier on
    • Typically you need to do some processing, as raw data frequently is in a format that is not immediately useful
    • Most of your time will probably be spent focussed around cleaning data and feature engineering
  • More Data > Smarter Algorithms
    • If the accuracy of your model is not adequate you can either change/modify the model or train it on more data
    • More data is usually better, but can be time consuming to gather and clean
  • Understand multiple models
    • Try to understand a variety models
    • When tackling a problem, try to solve it with simpler learners:
      • naive bayes -> logistic regression -> k-nearest neighbour -> SVM
      • Fancy learners may be interesting but can increase the complexity and function as black boxes
    • Often the best learner for a problem can vary based on the goals of the project and the data you have access to
The following two tabs change content below.
Computational biology PhD candidate at the Australian National University. I love writing (both articles and software), learning more about the world around us, and beekeeping. I also write for BioSky.co

Latest posts by Jack Simpson (see all)

Comments are closed.