Multi-core parallel processing in Python with multiple arguments

I recently had need for using parallel processing in Python. Parallel processing is very useful when:

  • you have a large set of data that you want to (or are able to) process as separate ‘chunks’.
  • you want to perform an identical process on each individual chunk (i.e. the basic code running on each chunk is the same). Of course, each chunk may have its own corresponding parameter requirements.
  • the order in which each chunk is processed is not important, i.e. the output result from one chunk does not affect the processing of a subsequent chunk.

Under these conditions, if you are working on a multi-core computer (which I think is true for virtually all of us), you can set up your code to run parallelly using several or all of your computer’s cores. Using multiple cores is of paramount importance in order to gain any improvement in computation time. If you attempt such parallel processing on a single core, the computer will simply switch between separate computational threads on that single core, and the total computation time will remain constant (in fact, more likely the total time will increase because of the incessant switching between threads).


Anyhow, there are several methods of achieving multi-core parallel processing in Python. In this post, I will describe what I think is the simplest method to implement. This is the method I chose, and with whose results I am quite happy.

Additionally, most examples online that go over implementing parallel processing never mention how to handle multiple input arguments separate from the iteration parameter. There are several methods of including that too, and I will also describe what I think is the simplest method to implement and maintain.

Say, you have the following code setup:

arg1 = val1
arg2 = [val2, val3]
arg3 = ['val4', 'val5']
fileslist = ['list', 'of', 'files', 'that', 'are', 'to', 'be', 'processed']

for file in fileslist:
    print('Start: {}'.format(file))
    # perform a task with arg1
    # perform a task with arg2
    # print something with arg3
    # save some data to disk
    print('Status Update based on {}'.format(file))

Now, for parallel processing, the target is to convert the for loop into a parallel process controller, which will ‘assign’ file values from fileslist to available cores.

To achieve this, there are two steps we need to perform. First, convert the contents of your for loop into a separate function that can be called. In case of parallel processing, this function is only allowed one argument. Set up your function accordingly, planning that this single argument will be a tuple of variables. One of these variables will be the iteration variable, in our case file, and the rest will be the remaining variables required.

def loopfunc(argstuple):
    file = argstuple[0]
    arg1 = argstuple[1]
    arg2 = argstuple[2]
    arg3 = argstuple[3]
    print('Start: {}'.format(file))
    # perform a task with arg1
    # perform a task with arg2
    # print something with arg3
    # save some data to disk
    return 'Status Update based on {}'.format(file)

Second, update the main code structure to enable multi-core processing. We will be using the module concurrent.futures. Let’s see the updated code first, before I explain what is happening.

import concurrent.futures

arg1 = val1
arg2 = [val2, val3]
arg3 = ['val4', 'val5']
fileslist = ['list', 'of', 'files', 'that', 'are', 'to', 'be', 'processed']

argslist = ((file, arg1, arg2, arg3) for file in fileslist)
with concurrent.futures.ProcessPoolExecutor() as executor:
    results = executor.map(loopfunc, argslist)

    for rs in results:
        print(rs)

OK, now let’s go over it. The with ... line invokes the parallel processing tool which creates the executor object. In the next line, executor.map() is used to provide two pieces of information: (a) what function is to be repeatedly executed, and (b) a tuple of arguments that need to be passed for each function execution. Notice that when calling executor.map(), we are providing loopfunc as an object, and are not attempting to execute the function itself via loopfunc().

Now, argslist is meant to be a tuple containing arguments for all iterations of loopfunc, i.e. len(argslist) = len(fileslist). However, in our case, only the fileslist variable is iterated over, while other arguments are provided ‘as-is’. The workaround for this is to use list-comprehension (err… I mean tuple-comprehension) to generate a new variable (in our case argslist) that contains all relevant arguments for each function iteration.

In this way, the first process is created with loopfunc( (fileslist[0], arg1, arg2, arg3) ), the second process is created with loopfunc( (fileslist[1], arg1, arg2, arg3) ), and so on. Of course, within loopfunc(), we have already converted the input single argument into multiple arguments as we need.

Values return-ed from loopfunc() are stored in the variable results, which is looped over to print out each value. The fun behavior here is that each rs item is executed as that value becomes available, i.e. when each process completes. For example, if you’re running on a 4-core machine, output from the code can look like the following, depending upon the speed of execution of each iteration:

Start: fileslist[0]
Start: fileslist[1]
Start: fileslist[2]
Start: fileslist[3]
Status Update based on fileslist[0]
Status Update based on fileslist[1] 
Start: fileslist[4]
Start: fileslist[5]
Status Update based on fileslist[2] 
Start: fileslist[6]
Status Update based on fileslist[3] 
Start: fileslist[7]
...

Without any arguments, ProcessPoolExecutor() creates as many processes as there are cores on your computer. This is great if you want to run your code and walk away for a few hours, letting your Python script take over your whole computational capability. However, if you only want to allow a specific number of processes, you can use ProcessPoolExecutor(max_workers=nproc), where nproc is the number of processes you want to simultaneously allow at most.

To-do

In my current implementation I have used the above method to work on ‘chunks’ of data and then saved the resultant output with appropriate markers to disk. However, another way to implement parallel processing would be to take the output from each iteration, and save it as an element in an array, at the correct array index.

This should not be hard to do, all I should need is to return both the output data and the correct marker for the array index. I just haven’t done it (nor needed to do it) yet. I actually prefer saving the output from each chunk to disk separately, if possible, so that even if something crashes (or the power goes out, or whatever) and the process is interrupted, I won’t lose all progress made until then.


The state of dysfunction in the Indian Congress Party

A series of news items appeared recently in relation to the Congress party of India. While the news reporting went largely without comment (or with usual snark from their political opponents), to me, they brought to sharp focus the extent of dysfunction and rot within the party.

First, leading up to a Congress Working Committee (CWC) meeting, some senior party members wrote to the “interim” Congress President, Sonia Gandhi. (Remember, she became interim President after her son, Rahul Gandhi, resigned from the post. Before Rahul, the very same Sonia was President.) Here is the gist of the demands in the letter, including the following:

It calls for a “full time and effective leadership” which is both “visible” and “active” in the field; elections to the CWC; and the urgent establishment of an “institutional leadership mechanism” to “collectively” guide the party’s revival.

OK, so this is in effect a serious criticism, from senior members of the party, that some changes are required going forward. So, what happened next? Rahul Gandhi’s response was to criticize the timing of the letter, since this is a time of weakness for Congress and his mother was in hospital:

Early in the Congress Working Committee meeting that went on for seven hours, Rahul Gandhi questioned why the 23 top leaders had written a letter attacking the Congress when it was at its weakest, when it was battling crises in Madhya Pradesh and Rajasthan and when the Congress president (his mother Sonia Gandhi) was in hospital.

If you’re wondering about the seriousness of this critique, it was serious enough for the conversation to completely pivot:

The veteran leader [senior Congress member Ghulam Nabi Azad], a Rajya Sabha member, said he had called and checked with Sonia Gandhi’s private secretary twice before sending the letter. “I was told that she is in hospital for a routine check-up. Still, we waited till she was back home before sending the letter,” Mr Azad told NDTV.

Sonia Gandhi, who was admitted to hospital late last month, was discharged in the beginning of August.

He [Azad] said the Congress chief called a few days later and said she could not respond to the letter because of her poor health.

I told Soniaji, your health is paramount, all else can wait,” said Mr Azad. He claimed that Rahul Gandhi heard him out and was “satisfied” with the response.

Two things. First, is Rahul suggesting that Sonia is too ill to discharge her duties as President? Then why is she still holding the post?! This is a professional organization, where office-holders have duties and responsibilities… such as dealing with grievances of senior members of the organization! Second, if Sonia’s illness was a temporary matter, why is there not a chain of command in place?! It is perfectly natural for any single individual to occasionally be “off-duty”, so to say, due to either illness, or personal commitments, or vacations, or myriad other reasons. Any coherent organization should have a command structure where such absences are planned for! If Sonia is ill and unavailable, that should NOT mean that normal operations cease; it should only mean that someone else accepts the letter and follows an established protocol.

Next, at the CWC meeting itself, this was quoted to Sonia Gandhi regarding the ‘dissenters’:

Sonia Gandhi reportedly said in her closing remarks that she held “no ill-will” towards anyone in the party, a remark intended at the dissent-letter writers. “I am hurt but they are my colleagues, bygones are bygones, let us work together,” she said, ending the Congress Working Committee (CWC) meeting on a note of conciliation.

Does this seem to come from an organization of equals? Or does this seem to originate from a king/queen ruling over his/her subjects? How does it matter if Sonia Gandhi holds ill-will for the letter? Why does it matter? Again, this is a professional organization, where senior members are suggesting changes going forward for what they think is the benefit of the party. Why is Sonia Gandhi “hurt”? Because she was criticized? Does she consider herself above criticism? “Let us work together? Bygones are bygones?” YOU, Sonia Gandhi, and your son, are the ones throwing a tantrum! Your senior members were the reasonable adults coming to you with proposed changes going forward that might benefit the party!

You know what I think the problem was? Maybe Sonia and Rahul were not entirely convinced that ‘benefit of the party’ and ‘benefit of the power dynamics of the Nehru-Gandhi dynasty’ were well aligned. At the CWC, it was decided that elections for the next “full time” president would be held within six months. Remember that the last president, Rahul Gandhi, resigned after the last election where their political opponents basically humiliated them. Already, quotes like this:

The Congress” Assam unit on Monday said that it wants senior leader Rahul Gandhi as the party”s national president as soon as the interim chief Sonia Gandhi demits the office.

and this:

[I]t is imperative that the party should be led by Gandhi family. I humbly request you to continue as the President of All India Congress Committee, and if you feel that your health may not permit for full-fledged dedication, I urge you to convince Shri Rahul Gandhi to take up the position.

have started to appear. Would you take a bet on Rahul Gandhi not being the next Congress President, again? I wouldn’t.

What I wrote in my post on India’s Independence Day, in criticism of the current government of India, applies equally well to the party in government opposition. If, instead of performing their duty of providing strong, thoughtful rebuttal of the government’s policies, the main opposition is worried about controlling their internal power dynamics, and especially about keeping power within a dynastic family, then that bodes terribly for the country as a whole.

Where are the Congress’ ideas for India? For all that we criticize the Indian government, if an election were to be held today, who is providing an alternative narrative that citizens can latch on to and organize around? What does Congress think India should do in the next 10, or 20, or 50 years? Does it have any opinion as an organization? The current Indian government came to power on the heels of 10 years of Congress led government— after massive corruption and malfeasance, but also with BJP fanning the flames of criticism, and equally importantly, providing an alternative vision and path forward. (This was, of course, in 2014. The 2019 campaign was a different matter.)

It seems to me like Congress today is missing vision, missing organization— and perhaps even missing a pulse. It seems to me like the senior Congress members are very, very right.


India’s Independence Day

Happy Independence Day, India. In addition to celebrating, maybe it’s time for some introspection too! Let’s not forget where we came from, but let’s focus on where we want to be going.

We are a relatively young democracy, still in our growing years. As such, let’s not allow the selfish, petulant adolescents amongst us to dictate our lives and our future. If we let the misguided and sinister make our decisions, we risk letting them destabilize a fine balance.

I am choosing to do X because some people I dislike did Y some time ago, and X will hurt those people” is middle school mentality, and should not be the basis for a government’s decision making. The answer to “why are we doing this?” has to be “this is how it helps us in the next 30 years”, not “this is what our opponents did in the last 30 years”. (Yes, people outside the government will engage in all manner of shenanigans. That’s the privilege of not being in power.)

It is petulant, selfish behavior to pursue short term gratification at the cost of harm to self and others, even more so in times of a pandemic. It cannot be acceptable for the leader of the central and a state government to ignore social distancing and in fact hold an event with people all around. If that’s the example they set, what message do they send to their constituents looking for leadership? This is callous and outrageous.

It is also outrageous for the head of a government to participate in any religious ceremony in their official capacity. Of course, if they want to take a day off, and pursue their religion as private citizens, that is agreeable, whatever religion they want to pursue. As official government representatives, they can and should attend all manner of ceremonies, from all communities, not just their own.

Patriotism Comic

Comic by @SanitaryPanels.

We are as yet a young democracy. It hasn’t been long enough for us as a country to forget what it took to gain independence. It hasn’t been long enough for us to forget, or worse—ignore, the principles and ideas on which India was founded. We are a unique, complex, multi-cultural, blended pool of humanity, requiring active effort to build and keep harmony. If we are to be united, we have to refrain from being communal, we have to resist our entrenched judgments of our neighbors, we have to rise up in support of those who cannot speak for themselves.

Usually, we are supposed to look to our government, as our representatives, to uphold these values, and hold us together as a nation. If — when — they fail to do so, it is up to us to unite, resist, and rise up against the government too.


☛ Human evolution and the role of our grandmothers

From the archives, this article from NPR sheds fascinating light on the role of our grandmothers in human evolution. For example, Dr. Kristen Hawkes at the University of Utah follows modern hunter-gatherer tribes to understand how our ancestors might have lived.

Over many extended field visits, Hawkes and her colleagues kept track of how much food a wide sample of Hadza community members were bringing home. She says that when they tracked the success rates of individual men, “they almost always failed to get a big animal.” They found that the average hunter went out pretty much every day and was successful on exactly 3.4 percent of those excursions. That meant that, in this society at least, the hunting hypothesis seemed way off the mark. If people here were depending on wild meat to survive, they would starve.

So if dad wasn’t bringing home the bacon, who was? After spending a lot of time with the women on their daily foraging trips, the researchers were surprised to discover that the women, both young and old, were providing the majority of calories to their families and group-mates.

A Hazda woman digs for tubers with a digging stick.

A Hazda woman digs for tubers with a digging stick. (copyright NPR/Nigel Pavitt/Getty Images/AWL Images).

As we learn more, we are coming to realize that our strong relations with our grandparents is not just a weird (and lucky!) quirk of our evolution, but quite necessary to our anthropological journey to our present.

For starters, not all animals have ‘grandparents’, i.e. ‘elders’ living long past their reproductive age, in the first place. Humans (and other great apes), whales and elephants are a small minority of those with societal grandparents. Even among humans, having grandparents may be a more recent development than we think.

This NPR article provides a great perspective from several researchers. We were surely hunter gatherers in our evolutionary past, but it turns out that how our hunting and gathering occurred is way more complex than the men hunted and fed their families.

If you’re following Dr. Hawkes’ work, you might be interested in this podcast that she appeared on at The Insight.


☛ Pregnant elephant tortured to death in India: it was fed a pineapple stuffed with firecrackers.

I am appalled to admit that the creatures who did this are of my same species:

An elephant that was pregnant died in Kerala, standing in water, last Wednesday, after she faced one of the most brutal forms of animal abuse. She ate a pineapple filled with firecracker, offered to her allegedly by some locals. The fruit exploded in her mouth, leading to the inevitable tragedy.

[…]

So powerful was the cracker explosion in her mouth that her tongue and mouth were badly injured. The elephant walked around in the village, in searing pain and in hunger. She was unable to eat anything because of her injuries.

I am more disturbed by this incident than I can put into words. Poor, poor elephant, expecting a minimum — the very minimum — of cross-species friendliness, and receiving not just death, not just agony, but excruciating, hours-long torture. The creatures that did this don’t deserve to share the Earth with anyone.

The elephant stands in the Velliyar River.

The elephant stood in the Velliyar river for hours, refusing help and in ‘searing pain’, until it died standing in the water. (via NDTV).

The news report was based on accounts from a forest officer on social media who went to respond to the situation, and has no mention of whether anyone has been arrested for this. The creatures that did this should face consequences at the very least according to the laws of their own species, surely. (That would be inadequate and the bare minimum, but the rest of us are, after all, bound by such things as codes of conduct, and laws, and morals.)

Anyway, this here is the relevant Indian Penal Code section:

[Section] 429. Mischief by killing or maiming cattle, etc., of any value or any animal of the value of fifty rupees.—Whoever commits mis­chief by killing, poisoning, maiming or rendering useless, any elephant, camel, horse, mule, buffalo, bull, cow or ox, whatever may be the value thereof, or any other animal of the value of fifty rupees or upwards, shall be punished with imprisonment of either description for a term which may extend to five years, or with fine, or with both.

Whoever did this needs to be behind bars. Anyone that could have spoken up and didn’t needs to be behind bars too. 5 years, the penal code says. I think that’s too few; there’s no mention of torture in the code, and ‘mischief’ is quite inadequate to capture the extent of this monstrosity. Put them all in jail, and slap fines large enough that they spend the rest of their lives just paying them off.

Poor, poor elephant.