Home » Deep Learning PyData Talk

Deep Learning PyData Talk

by Jack Simpson

Deep learning is a type of machine learning based on neural networks which were inspired by neurons in the brain. The difference between a deep neural network and a normal natural network is the number of ‘hidden layers’ between the input and output layers.

I recently watched an excellent presentation on Deep Learning by Roelof Pieters titled ‘Python for image and text understanding; One model to rule them all!‘ I can recommend watching it, and I’ve written this post for me to put down a few of my own bullet points from the talk for future reference.

Roelof had a 5 point process for training a deep neural network:

  1. Preprocess the Data
  • Try to do as little as possible – the more transformations you do, the less you allow the network to come up with its own representations of the data: the more raw the better
  • Mean subtraction normalisation
  • Divide by standard deviation
  • If you data is noisy you may want to do some PCA and whitening: reduce the dimensions
  • Compute statistics on training data but apply on all (training and test) data
  1. Choose the architecture
  • Three choices:
    • Deep Belief Network (DBN): Series of restricted Boltzmann machines (RBM). Useful for hierarchical data like medical or audio datasets
    • Convolutional Net (CNN): Convolutional layers are small filters/crops of the image that you sum together. Useful for images.
    • Recurrent Net (RNN): Form of Hidden Markov Model. Useful for natural language processing.
  1. Train
  • Assign layer definitions and layer parameters, learning rate etc
  1. Optimise/Regularise
  • Move between Optimise/Regularise step and training step whilst improving
  • Visualise loss curve – lasagne comes with functions to achieve this
  • Visualise accuracy
  • Can visualise weights: with images you want to see edges in the first layer
  • Can optimise hyperparameters
    • Grid Search (Won’t work for millions of parameters)
    • Random search (Takes a long time)
    • Bayesian optimisation (seems to work the best, spearmint and hypergrad libraries available)
  • Data augmentation: With images you can scale, rotate, contrast, flip
  • Dropout: randomly switch off nodes: allows the network to adapt
  • Batch normalisation
  1. Tips/Tricks
  • Ensembles: Train multiple models and can allow them to vote on prediction or take the average vote with continuous data. Ensure classifiers are not correlated.

You may also like