Category:

Programming

Programming and software development

Solving the knapsack problem with Python and Gurobi

by Jack Simpson January 17, 2025

written by Jack Simpson

During my PyCon presentation on Mathematical optimisation, I used the example of the knapsack problem to walk people through the structure of a linear program. I’ve had a number of people request access to this code, so I thought I’d write it up in a post so it could be more easily shared. This will be a relatively concise article so feel free to refer to the presentation for any additional context.

What is the knapsack problem?

You have a knapsack that can only carry so much weight (in our case 9 kg)
Every item you can place in the knapsack has a weight and a value
You want to maximise the value of the items you choose to put in the knapsack
What is the most valuable mix of items you can choose?

Why use this example?

I decided to use this particular example because I wanted to illustrate how mathematical optimisation is perfect for solving problems where you have a large number of granular decisions to make whilst subject to constraints. Many industries face some version of these problems. While machine learning is fantastic for prediction/classification problems, mathematical optimisation was developed specifically to solve complex decision problems.

Let’s solve it in Python

Step 1: Import libraries and initialise a model object.


import gurobipy as gp
from gurobipy import GRB

m = gp.Model("knapsack")

Step 2: Create the model decision variables. These are the binary decisions (0 or 1) that determine whether we are selecting this item to go into the knapsack.




gold = m.addVar(vtype=GRB.BINARY, name="gold")
diamond = m.addVar(vtype=GRB.BINARY, name="diamond")
coin = m.addVar(vtype=GRB.BINARY, name="coin")
statue = m.addVar(vtype=GRB.BINARY, name="statue")
brick = m.addVar(vtype=GRB.BINARY, name="brick")
feather = m.addVar(vtype=GRB.BINARY, name="feather")
ark = m.addVar(vtype=GRB.BINARY, name="ark")
book = m.addVar(vtype=GRB.BINARY, name="book")
necklace = m.addVar(vtype=GRB.BINARY, name="necklace")

Step 3: Create the weight constraint – in this case the backpack cannot hold more than 9 kg of objects.


backpack_max_weight = 9
m.addConstr(2 * gold +
     0.2 * diamond +
     0.3 * coin +
     6 * statue +
     2 * brick +
     0.1 * feather +
     15 * ark +
     1 * book +
     0.5 * necklace <= backpack_max_weight, 
     name="max_weight")

Step 4: Create the objective function of the model – we want to maximise the value of the objects we select.


m.setObjective(
    5000 * gold + 
    1500 * diamond + 
    2000 * coin + 
    8000 * statue + 
    3 * brick + 
    1 * feather + 
    1000000 * ark +
    2000 * book +
    3500 * necklace,
    GRB.MAXIMIZE)

Step 5: Run the model and extract the variables and their optimal values.


m.optimize()

for v in m.getVars():
    print(v.VarName, v.X)

January 17, 2025 0 comments

Featured Optimisation Resources

Resources to learn mathematical optimisation

by Jack Simpson December 10, 2024

written by Jack Simpson

When I was at PyCon, I had a number of people ask me what resources they could use to to learn more about mathematical optimisation. I thought I’d share my list in this article – if there’s any good content out there I may have missed, please reach out and I’d love to add it to this page!

Articles

Mathematical Optimization: A Powerful Tool for the Energy Industry
Hands-On Linear Programming: Optimization With Python
Linear Programming (Brilliant)
What is Linear Programming? Definition, Methods and Problems
Linear Programming (BYJU’S)
Linear Programming (GeeksForGeeks)
Pyomo Cookbook

Courses

Free optimisation course created by Gurobi and Georgia Tech available on Udemy
DataCamp course on optimisation using Python’s Pulp library
- Introduction to Optimization in Python

Videos

Books

Pyomo — Optimization Modeling in Python (3rd edition)

Python Libraries

To build a mathematical optimisation, you need to do two things: formulate the program, and then pass it to a solver program.

In terms of open source tools, I tend to formulate my model using Pyomo, developed by the National Renewable Energy Laboratory (NREL). You can then use the HiGHS solver (developed by the University of Edinburgh) to solve your model. Both of these libraries are easy to install via pip. Pyomo also has the advantage of allowing you to swap solvers without having to rewrite your program.

As I tend to build a lot of extremely large and complex models (especially ones with a lot of binary or integer variables), I end up using commercial solvers like Gurobi a fair bit. Gurobi has a lot of excellent Python libraries too like gurobipy and gurobipy-pandas.

December 10, 2024 0 comments

Featured Optimisation Video

PyCon: why data scientists should learn mathematical optimisation

by Jack Simpson December 10, 2024

written by Jack Simpson

In November I was fortunate to have the opportunity to present at PyCon AU in Melbourne, Australia. I was really excited to give a talk about why data scientists should be aware of the kinds of problems mathematical optimisation was really good at solving.

The recording of my presentation is below, but a few of my main points were:

There are three forms of analytics:
- Descriptive
- Predictive
- Prescriptive
Machine learning falls into the predictive category, whereas mathematical optimisation falls into the prescriptive analytics category.
Machine learning and mathematical optimisation are enormously complementary – you’ll often take the predictions of a machine learning model and use them as the inputs to a mathematical optimisation which will tell you the optimal decisions you should make, while subject to constraints.

December 10, 2024 1 comment

Content Energy Explanations Featured Machine Learning

How does AEMO predict demand in the National Electricity Market?

by Jack Simpson July 26, 2021

written by Jack Simpson

Every 5 minutes, AEMO will dispatch generators across the National Electricity Market (NEM) in order to meet demand. To achieve this, AEMO needs to predict what demand will look like 5 minutes in the future.

July 26, 2021 0 comments

Content Data Science Featured Programming Tips & Tutorials

Data science tip: store constants in their own file

by Jack Simpson July 7, 2021

written by Jack Simpson

Sometimes when you work on large projects, you can end up with code that looks like this:

if num_values < 15:
    do_thing()

Now, what does the 15 mean? Well, I’m sure you would have known when you first wrote the code, but what happens in 3 months when you try to modify it or pass it on to someone else in your team? Also, what happens if you determine that the number should actually be 16? Now you have to go through all your files and swap numbers in all the right places (and if you miss any you can introduce subtle bugs).

So what’s the solution? Well, what you really need is a single source of truth for your clearly defined constants, which you can:

Easily look up
Modify/update
Import into other code

Python makes this really easy – all you have to do is create a “constants.py” file in your project directory (I suppose you could call it whatever you like, but that’s beside the point). From here, you can import your variables as if you were importing a library:

from constants import NOM_FREQ_HZ, VALUES_IN_MIN

A screenshot of this file from a recent project looks a little like this:

This approach helped me write much more readable code, when working with an extremely complicated dataset where numbers were used to represent different value categories.

Obviously for smaller scripts it might not be worth setting up a dedicated file, but for this project, my constants file ended up being a couple of hundred lines long, and is far easier to maintain than if they were defined multiple times across all the files in the project.

July 7, 2021 0 comments

Content Data Science Energy Links Machine Learning Research

Excellent seminar on the applications of machine learning in the energy sector

by Jack Simpson June 29, 2021

written by Jack Simpson

If you’ve ever wanted to see the impact that machine learning is having in the energy sector, then I recommend watching this seminar released by the National Renewable Energy Laboratory (NREL).

Each talk describes an application of machine learning in the industry at different levels, from the big (weather and climate modelling) through to the small (optimising the aerodynamics of turbine blades).

Some of the topics discussed include:

How researchers at NREL are using generative adversarial networks (GANs) to assist them with weather and climate modelling
How you can represent a wind farm as a graph neural network (GNN) with directed edges (this is brilliant!)
How hard it is to acquire enough data to train models for wind farms (this is why they mention having success with ensemble-based modelling approaches)
How they’ve been creating simulations to augment their wind farm datasets
A few key points which I agree with from personal experience
- Features matter more than models – having enough input data, processed in the right way often matters more than the specific machine learning algorithm you’re using
- Training models is expensive and time consuming, but once that stage is done, you can run them cheaply and quickly in production

June 29, 2021 0 comments

Content Data Science Links Programming Resources

Learning to program in Julia (for energy modelling)

by Jack Simpson June 28, 2021

written by Jack Simpson

Most people know that I’m a huge fan of the Python programming language – while that isn’t going to change, a recent encounter with some researchers at CSIRO has convinced me that I should pick up Julia for some of the energy modelling and optimisation work that I do.

I’ve known for a while that Julia was a language with a lot of benefits (as fast as a lower-level language but with the productivity benefits of a higher-level language). However, if you understand how to write efficient vectorised code in Python (using NumPy and Pandas), then except for some use-cases, you don’t really get that much of a boost out of switching to Julia.

So what changed my mind? Well, for the past few years, the National Renewable Energy Laboratory (NREL) have been working on a number of amazing open-source energy modelling packages for the Julia programming language. I’ve now updated my electricity modelling resources page with the links to some material about these packages.

June 28, 2021 0 comments

Content Data Science Featured Links Machine Learning Resources Statistics

Simple explanations of data science and statistics concepts

by Jack Simpson June 20, 2021

written by Jack Simpson

One of my favourite data science resources is the mini-episode series of the Data Skeptic podcast. These short episodes would feature the host explaining a data science concept to a non-expert in plain English.

I wanted to share a few of these with some colleagues from work and thought I’d catalogue them here.

Machine learning

June 20, 2021 1 comment

Content Data Science Featured Programming Tips & Tutorials

How to improve the performance of Pandas with eval and query

by Jack Simpson April 5, 2021

written by Jack Simpson

I should mention up-front that the techniques described in this post are really only worthwhile once you have a dataset in the millions of rows or above. Once your data hits this size, it is worth paying the initial optimisation overhead as it will save you memory and be faster overall.

Pandas’ eval and query is built on Python’s Numexpr library, and provides an optimised way to run a calculation or filter on a Pandas dataframe. For example, the code below shows you the traditional way of doing these things in Pandas:

start = '2020-02-10 08:20:00'
end = '2020-02-10 08:30:00'
duids = ['LYA4', 'BW02']

# traditional vectorized calculation
map_gen_df['DIST'] = np.sqrt(lya1_df['SEC_DIFF'].pow(2) + lya1_df['VALUE_DIFF'].pow(2))

# traditional filter
event_duid_df = map_gen_df[(map_gen_df['MMSNAME'].isin(duids))&(map_gen_df['TIMESTAMP_MIN']>=start)&(map_gen_df['TIMESTAMP_MIN']<=end)]

April 5, 2021 0 comments

Content Data Science Featured Programming Tips & Tutorials

How to vectorize conditional calculations in Python

by Jack Simpson April 4, 2021

written by Jack Simpson

Pandas and NumPy are fantastic libraries that enable you to take advantage of vectorization to write extremely efficient Python code. However, what happens when the calculation you wish to run changes based on the value in another column of your dataset?

For example, take a look at the dataset in the table below (along with the code to generate it):

Group	Value
A	1
A	1
B	1
C	1

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Group':['A','A','B','C'],
    'Value':[1,1,1,1]
})

Imagine I wish to create a third column (‘Result’) based on the following logic:

Multiply Value by 2 if Group == ‘A’
Multiply Value by 3 if Group == ‘B’
Multiply Value by 4 if Group == ‘C’
Fill with a missing value (nan) if none of the above is true

April 4, 2021 0 comments

Newer Posts

Older Posts