Deep Learning PyData Talk

Deep learning is a type of machine learning based on neural networks which were inspired by neurons in the brain. The difference between a deep neural network and a normal natural network is the number of ‘hidden layers’ between the input and output layers.

I recently watched an excellent presentation on Deep Learning by Roelof Pieters titled ‘Python for image and text understanding; One model to rule them all!‘ I can recommend watching it, and I’ve written this post for me to put down a few of my own bullet points from the talk for future reference.

Roelof had a 5 point process for training a deep neural network:

  1. Preprocess the Data
  • Try to do as little as possible – the more transformations you do, the less you allow the network to come up with its own representations of the data: the more raw the better
  • Mean subtraction normalisation
  • Divide by standard deviation
  • If you data is noisy you may want to do some PCA and whitening: reduce the dimensions
  • Compute statistics on training data but apply on all (training and test) data
  1. Choose the architecture
  • Three choices:
    • Deep Belief Network (DBN): Series of restricted Boltzmann machines (RBM). Useful for hierarchical data like medical or audio datasets
    • Convolutional Net (CNN): Convolutional layers are small filters/crops of the image that you sum together. Useful for images.
    • Recurrent Net (RNN): Form of Hidden Markov Model. Useful for natural language processing.
  1. Train
  • Assign layer definitions and layer parameters, learning rate etc
  1. Optimise/Regularise
  • Move between Optimise/Regularise step and training step whilst improving
  • Visualise loss curve – lasagne comes with functions to achieve this
  • Visualise accuracy
  • Can visualise weights: with images you want to see edges in the first layer
  • Can optimise hyperparameters
    • Grid Search (Won’t work for millions of parameters)
    • Random search (Takes a long time)
    • Bayesian optimisation (seems to work the best, spearmint and hypergrad libraries available)
  • Data augmentation: With images you can scale, rotate, contrast, flip
  • Dropout: randomly switch off nodes: allows the network to adapt
  • Batch normalisation
  1. Tips/Tricks
  • Ensembles: Train multiple models and can allow them to vote on prediction or take the average vote with continuous data. Ensure classifiers are not correlated.

The following two tabs change content below.
Computational biology PhD candidate at the Australian National University. I love writing (both articles and software), learning more about the world around us, and beekeeping. I also write for

Latest posts by Jack Simpson (see all)

Comments are closed.