Category:

Machine Learning

How does AEMO predict demand in the National Electricity Market?

by Jack Simpson July 26, 2021

written by Jack Simpson

Every 5 minutes, AEMO will dispatch generators across the National Electricity Market (NEM) in order to meet demand. To achieve this, AEMO needs to predict what demand will look like 5 minutes in the future.

July 26, 2021 0 comments

Content Data Science Energy Links Machine Learning Research

Excellent seminar on the applications of machine learning in the energy sector

by Jack Simpson June 29, 2021

written by Jack Simpson

If you’ve ever wanted to see the impact that machine learning is having in the energy sector, then I recommend watching this seminar released by the National Renewable Energy Laboratory (NREL).

Each talk describes an application of machine learning in the industry at different levels, from the big (weather and climate modelling) through to the small (optimising the aerodynamics of turbine blades).

Some of the topics discussed include:

How researchers at NREL are using generative adversarial networks (GANs) to assist them with weather and climate modelling
How you can represent a wind farm as a graph neural network (GNN) with directed edges (this is brilliant!)
How hard it is to acquire enough data to train models for wind farms (this is why they mention having success with ensemble-based modelling approaches)
How they’ve been creating simulations to augment their wind farm datasets
A few key points which I agree with from personal experience
- Features matter more than models – having enough input data, processed in the right way often matters more than the specific machine learning algorithm you’re using
- Training models is expensive and time consuming, but once that stage is done, you can run them cheaply and quickly in production

June 29, 2021 0 comments

Content Data Science Featured Links Machine Learning Resources Statistics

Simple explanations of data science and statistics concepts

by Jack Simpson June 20, 2021

written by Jack Simpson

One of my favourite data science resources is the mini-episode series of the Data Skeptic podcast. These short episodes would feature the host explaining a data science concept to a non-expert in plain English.

I wanted to share a few of these with some colleagues from work and thought I’d catalogue them here.

Machine learning

June 20, 2021 1 comment

Content Data Science Featured Machine Learning Research

Bees, lasers, and machine learning

by Jack Simpson February 3, 2017

written by Jack Simpson

A couple of years ago I started my PhD at the Australian National University working to quantify honeybee behaviour. We wanted to build a system that could automatically track and compare different groups of bees within the hive.

I took the project as I had a background in biology, beekeeping and programming, and I wanted to work in a lab where I could learn from a supervisor who was incredibly knowledgeable about both biology and software development.

February 3, 2017 0 comments

Content Data Science Machine Learning

Notes from ‘A Few Useful Things to Know about Machine Learning’

by Jack Simpson January 10, 2017

written by Jack Simpson

I was reading a paper by Pedro Domingos this evening which had some tips and advice for people using machine learning. I’ve written down some bullet points for my own reference and I hope someone else finds it useful. I know I’ve made some of the mistakes he gives advice about avoiding.

January 10, 2017 0 comments

Content Data Science Machine Learning

Deep Learning PyData Talk

by Jack Simpson January 7, 2017

written by Jack Simpson

Deep learning is a type of machine learning based on neural networks which were inspired by neurons in the brain. The difference between a deep neural network and a normal natural network is the number of ‘hidden layers’ between the input and output layers.

I recently watched an excellent presentation on Deep Learning by Roelof Pieters titled ‘Python for image and text understanding; One model to rule them all!‘ I can recommend watching it, and I’ve written this post for me to put down a few of my own bullet points from the talk for future reference.

January 7, 2017 0 comments

Content Data Science Machine Learning

Essential Libraries for Data Science on a Mac

by Jack Simpson December 28, 2016

written by Jack Simpson

I recently ran a fresh install on my Mac and thought I’d take the opportunity to document the libraries and programs I find incredibly useful.

The Python libraries I’ll frequently pip3 install include:

December 28, 2016 0 comments

Content Data Science Machine Learning

Improving Model Accuracy

by Jack Simpson December 11, 2016

written by Jack Simpson

I wrote a few quick bullet points down from the article “8 Proven Ways for improving the “Accuracy” of a Machine Learning Model” for future reference.

Improving Accuracy

Add more data
Fix missing values
- Continuous: impute with median/mean/mode
- Categorical: treat as separate class
- Predict missing classes with k-nearest neighbours
Outliers
- Delete
- Bin
- Impute
- Treat as separate to the others
Feature engineering
- Transform and normalise: scale between 0-1
- Eliminate skewness (e.g. log) for algorithms that require normal distribution
- Create features: Date of transactions might not be useful but day of the week may be
Feature selection
- Best features to use: identify via visualisation or through domain knowledge
- Significance: Use p-values and other metrics to identify the right values. Can also use dimensionally reduction while preserving relationships in the data
Test multiple machine learning algorithms and tune their parameters
Ensemble methods: combine multiple week predictors (bagging and boosting)

December 11, 2016 0 comments

Content Data Science Machine Learning Tips & Tutorials

Working with Imbalanced Classes

by Jack Simpson December 11, 2016

written by Jack Simpson

I wrote a few quick bullet points down from the article “8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset” for future reference.

Tactics

Imbalanced datasets occur when you have a class that occurs much more infrequently than the others.
If a model ignores the class, it can still achieve a high classification accuracy, but that’s not the result we want
Make sure you use a confusion matrix to ensure that you’re getting acceptable accuracy for all your classes
One potential solution to this problem is to collect more data
Resampling your dataset (can be random or non-random – stratified):
- Add copies of underrepresented classes (oversampling/sampling with replacement). Useful if you don’t have much data – 10s of thousands or less.
- Delete instances of classes that occur frequently (undersampling). Handy to use if you have a lot of data – 10s-100s of thousands of instances
Try different algorithms: decision trees can perform well on imbalanced datasets
Penalised models: Extra penalties for misclassifying minority class. Examples of these algorithms could include penalized-SVM and penalized-LDA.
There are areas of research dedicated to imbalanced datasets: can look into anomaly detection and change detection.

December 11, 2016 0 comments

Content Data Science Machine Learning Tips & Tutorials

Assessing machine learning algorithm performance

by Jack Simpson December 11, 2016

written by Jack Simpson

I wrote a few quick bullet points down from the article “How To Implement Machine Learning Algorithm Performance Metrics From Scratch With Python“.

Metrics

Classification accuracy
- Test how well predictions of a model do overall
- accuracy = correct predictions / total predictions
Confusion matrix
- Use to identify how well your predictions did with different classes
- Very useful if you have an imbalanced dataset
- I wrote an extremely hacked together confusion matrix for my tag identification software. I had 4 classes (U, C, R, Q) and the confusion matrix shows you what your model predicted against what the real category was.

	U	C	R	Q
U	175	17	67	1
C	11	335	14	0
R	26	8	298	0
Q	6	0	3	93

Mean absolute error for regression
- Positive values – the average of how much your predicted value differ from the real value
Root mean squared error for regression
- Square root of the mean of squared differences between the actual and predicted value
- Squaring the values gives you positive numbers and finding the root lets you compare the values to the original units.

December 11, 2016 0 comments

Newer Posts

Older Posts