Pandas and NumPy are fantastic libraries that enable you to take advantage of vectorization to write extremely efficient Python code. However, what happens when the calculation you wish to run changes based on the value in another column of your dataset?
For example, take a look at the dataset in the table below (along with the code to generate it):
Group | Value |
A | 1 |
A | 1 |
B | 1 |
C | 1 |
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Group':['A','A','B','C'],
'Value':[1,1,1,1]
})
Imagine I wish to create a third column (‘Result’) based on the following logic:
- Multiply Value by 2 if Group == ‘A’
- Multiply Value by 3 if Group == ‘B’
- Multiply Value by 4 if Group == ‘C’
- Fill with a missing value (nan) if none of the above is true