Multiprocessing in Python

I frequently find myself working with large lists where I need to apply the same time-consuming function to each element in the list without concern for the order that these calculations are made. I’ve written a small class using Python’s multiprocessing module to help speed things up.

It will accept a list, break it up into a list of lists the size of the number of processes you want to run in parallel, and then process each of the sublists as a separate process. Finally, it will return a list containing all the results.

import multiprocessing

class ProcessHelper:
    def __init__(self, num_processes=4):
        self.num_processes = num_processes
    
    def split_list(self, data_list):
        list_of_lists = []
        for i in range(0, len(data_list), self.num_processes):
            list_of_lists.append(data_list[i:i+self.num_processes])
        return list_of_lists
    
    def map_reduce(self, function, data_list):
        split_data = self.split_list(data_list)
        processes = multiprocessing.Pool(processes=self.num_processes)
        results_list_of_lists = processes.map(function, split_data)
        processes.close()
        results_list = [item for sublist in results_list_of_lists for item in sublist]
        return results_list

To demonstrate how this class works, I’ll create a list of 20 integers from 0-19. I’ve also created a function that will square every number in a list. When I run it, I’ll pass the function (job) and the list (data). The class will then break this into a list of lists and then run the function as a separate process on each of the sublists.

def job(num_list):
    return [i*i for i in num_list]

data = range(20)

p = ProcessHelper(4)
result = p.map_reduce(job, data)
print(result)

So if my data originally was a list that looked like this:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

When I split it into sublists, I’ll end up with a list of 4 lists (as I’ve indicated that I want to initialise 4 processes):

[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19]]

Finally, the result will give me the list of squared values that looks like this:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361]

I’ll continue to build this class as I identify other handy helper methods that I could add.

The following two tabs change content below.
Computational biology PhD candidate at the Australian National University. I love writing (both articles and software), learning more about the world around us, and beekeeping. I also write for BioSky.co

Latest posts by Jack Simpson (see all)

Comments are closed.