News Juxtaposition: Climate Change

Here are some news snippets from the last few weeks.

As China’s most important river, the Yangtze provides water to more than 400 million Chinese people. This summer, with rainfall in the Yangtze basin around 45% lower than normal, it reached record-low water levels with entire sections and dozens of tributaries drying up. The loss of water flow to China’s extensive hydropower system has created problems in Sichuan, which receives more than 80% of its energy from hydropower.

Nearly a half million people crowded into camps after losing their homes in widespread flooding and the climate minister warned Monday that Pakistan is on the “front line” of the world’s climate crisis after unprecedented monsoon rains wracked the country since mid-June, killing more than 1,130 people.

The drama is just the latest problem as the state experiences its biggest insurance crisis since Hurricane Andrew in 1992. […] In the last two years, more than 400,000 Floridians have had their policies dropped or nonrenewed. Fourteen companies have stopped writing new policies in Florida. Five have gone belly-up in 2022 alone. The record, set after Hurricane Andrew’s devastation, is eight in one year.

The latest casualty was Coral Gables-based Weston Property & Casualty, which leaves 22,000 policyholders — about 9,400 in South Florida — scrambling to find new insurance companies.

Costs also have skyrocketed. In 2019, when DeSantis was sworn in, Floridians paid an average premium of $1,988. This year, it’s now $4,231, triple the national average, according to an Insurance Information Institute analysis.

[T]he study published in the journal Nature Climate Change used satellite measurements of ice losses from Greenland and the shape of the ice cap from 2000-19. This data enabled the scientists to calculate how far global heating to date has pushed the ice sheet from an equilibrium where snowfall matches the ice lost. This allowed the calculation of how much more ice must be lost in order to regain stability.

The research shows the global heating to date will cause an absolute minimum sea-level rise of 27cm (10.6in) from Greenland alone as 110tn tonnes of ice melt. With continued carbon emissions, the melting of other ice caps and thermal expansion of the ocean, a multi-metre sea-level rise appears likely.

“It is a very conservative rock-bottom minimum,” said Prof Jason Box from the National Geological Survey of Denmark and Greenland (Geus), who led the research. “Realistically, we will see this figure more than double within this century.”

Climate change is happening, our civilization as it currently stands will be upended because of it, and we as a global society have done (next to) nothing to mitigate it. The best time to take measures to decelerate climate change was decades ago; the next best time is right now. Either we grit our teeth and hold our breath through a couple of decades of accelerated, painful, transition to sustainable energy use, or… we will be forced to hold our breath under water as our coastal life submerges.

By the way, 40% of the world’s population lives within 100km (60mi) the coast.


Music: lyrics for Nao, by Ritam Sen, Prasen, and Hoodkhola Kobitara

This is such a beautiful song; if you haven’t heard it, here’s a version on Youtube! (There are a couple other versions, such as this one, also great.)

Hat tip to Poorna for making me listen to this on one of our uncountable night drives; it has since been on repeat play for me.

Song: Nao
Lyrics: Ritam Sen
Music: Prasen
Group: Hoodkhola Kobitara

ekhon nistobdho mohonaye
eshe dariyeche dosh-jon shundor
bati ghorer naw-sho janalaye
koto pakhi khujche mrityur uttor!

ekhon nistobdho mohonaye
eshe dariyeche dosh-jon shundor
bati ghorer naw-sho janalaye
koto pakhi khujche mrityur uttor!

jeno churi jawa ek phali bhor aaj
mridu chhuye achhe himel gallery
jeno churi jawa ek phali bhor aaj
mridu chhuye achhe himel gallery

ei, ei ei ei
ei bhor nao, bondor nao
nao ey-nistobdho mohonao
nao ey-nistobdho mohonao

mm-hm ei, ei ei ei
ei bhor nao, bondor nao
nao ey-nistobdho mohonao
nao ey-nistobdho mohonao

shudhu tumi, ar tumi, ar tumi
koto mrito potrikaye kartuj-e
golaper sugondhi guhaye
aw-prem er ondor e chokh buje

shudhu tumi, ar tumi, ar tumi
koto mrito potrikaye kartuj-e
golaper sugondhi guhaye
aw-prem er ondor e chokh buje

aaj chand-er ghor makhto bichana
tomar podo-dhhoni lukoye bali te
aaj chand-er ghor makhto bichana
tomar podo-dhhoni lukoye bali te

ei, ei ei ei
ei duur nao, roddur nao
nao ey-nistobdho mohonao
nao ey-nistobdho mohonao

mm-hm ei, ei ei ei
ei bhor nao, bondor nao
nao ey-nistobdho mohonao
nao ey-nistobdho mohonao

nao ey-nistobdho mohonao
nao ey-nistobdho mohonao


☛ Ancient DNA traces origin of Black Death

A Silk Road stopover might have been the epicentre of one of humanity’s most destructive pandemics.

People who died in a fourteenth-century outbreak in what is now Kyrgyzstan were killed by strains of the plague-causing bacterium Yersinia pestis that gave rise to the pathogens responsible several years later for the Black Death, shows a study of ancient genomes.

“It is like finding the place where all the strains come together, like with coronavirus where we have Alpha, Delta, Omicron all coming from this strain in Wuhan,” says Johannes Krause, a palaeogeneticist at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, who co-led the study, published on 15 June in Nature.

Fascinating read on new research on the origins of Black Death. As you can imagine, it’s not an easy task to find genomic data from the plague bacteria, several centuries after the pandemic. Then, like now, how the pandemic spread mattered quite a lot of how and where a lot of humans came together and then dispersed, carrying the deadly disease with them.


☛ Of Cricket, and How Fast Bowling is About More Than Speed

It has been too long on this website with not a mention of cricket. To remedy that, here is essential reading by Cameron Ponsonby at ESPNCricinfo on how fast bowling speeds are only a portion of the feel of a fast bowler’s pace:

It is very easy to think of facing fast bowling as primarily a reactive skill. In fact, read any article on quick bowling and it will invariably say you only have 0.4 seconds to react to a 90mph delivery.

But what does that mean? No one can compute information in 0.4 seconds. It’s beyond our realm of thinking in the same way that looking out of an aeroplane window doesn’t give you vertigo because you’re simply too high up for your brain to process it.

However, the reason it’s possible is because, whilst you may only have 0.4 seconds to react, you have a lot longer than that to plan. And the best in the world plan exceptionally well.

When the ball arrives to you, as the batter, literally faster than you can react to the ball, how fast a ball feels has way more to do with diversity between bowlers than the raw pace on the ball.

Excellent and insightful read.

Another interesting piece by Ponsonby talks about data analytics in cricket. As Ponsonby mentions in his fast bowling article, cricket only dabbles in data analytics when compared to, say, baseball, where the analytics have been taken to another level altogether.

I think I’m okay with the balance that cricket has with its data analytics: I would rather have the analytics being fascinating reads for the fan, and an influence on the coaches/players, without their becoming all that anyone cares or talks about. I sometimes feel like the innate skill and art of sport gets lost in baseball. Makes for great reading though!


Multi-core parallel processing in Python with multiple arguments

I recently had need for using parallel processing in Python. Parallel processing is very useful when:

  • you have a large set of data that you want to (or are able to) process as separate ‘chunks’.
  • you want to perform an identical process on each individual chunk (i.e. the basic code running on each chunk is the same). Of course, each chunk may have its own corresponding parameter requirements.
  • the order in which each chunk is processed is not important, i.e. the output result from one chunk does not affect the processing of a subsequent chunk.

Under these conditions, if you are working on a multi-core computer (which I think is true for virtually all of us), you can set up your code to run parallelly using several or all of your computer’s cores. Using multiple cores is of paramount importance in order to gain any improvement in computation time. If you attempt such parallel processing on a single core, the computer will simply switch between separate computational threads on that single core, and the total computation time will remain constant (in fact, more likely the total time will increase because of the incessant switching between threads).


Anyhow, there are several methods of achieving multi-core parallel processing in Python. In this post, I will describe what I think is the simplest method to implement. This is the method I chose, and with whose results I am quite happy.

Additionally, most examples online that go over implementing parallel processing never mention how to handle multiple input arguments separate from the iteration parameter. There are several methods of including that too, and I will also describe what I think is the simplest method to implement and maintain.

Say, you have the following code setup:

arg1 = val1
arg2 = [val2, val3]
arg3 = ['val4', 'val5']
fileslist = ['list', 'of', 'files', 'that', 'are', 'to', 'be', 'processed']

for file in fileslist:
    print('Start: {}'.format(file))
    # perform a task with arg1
    # perform a task with arg2
    # print something with arg3
    # save some data to disk
    print('Status Update based on {}'.format(file))

Now, for parallel processing, the target is to convert the for loop into a parallel process controller, which will ‘assign’ file values from fileslist to available cores.

To achieve this, there are two steps we need to perform. First, convert the contents of your for loop into a separate function that can be called. In case of parallel processing, this function is only allowed one argument. Set up your function accordingly, planning that this single argument will be a tuple of variables. One of these variables will be the iteration variable, in our case file, and the rest will be the remaining variables required.

def loopfunc(argstuple):
    file = argstuple[0]
    arg1 = argstuple[1]
    arg2 = argstuple[2]
    arg3 = argstuple[3]
    print('Start: {}'.format(file))
    # perform a task with arg1
    # perform a task with arg2
    # print something with arg3
    # save some data to disk
    return 'Status Update based on {}'.format(file)

Second, update the main code structure to enable multi-core processing. We will be using the module concurrent.futures. Let’s see the updated code first, before I explain what is happening.

import concurrent.futures

arg1 = val1
arg2 = [val2, val3]
arg3 = ['val4', 'val5']
fileslist = ['list', 'of', 'files', 'that', 'are', 'to', 'be', 'processed']

argslist = ((file, arg1, arg2, arg3) for file in fileslist)
with concurrent.futures.ProcessPoolExecutor() as executor:
    results = executor.map(loopfunc, argslist)

    for rs in results:
        print(rs)

OK, now let’s go over it. The with ... line invokes the parallel processing tool which creates the executor object. In the next line, executor.map() is used to provide two pieces of information: (a) what function is to be repeatedly executed, and (b) a tuple of arguments that need to be passed for each function execution. Notice that when calling executor.map(), we are providing loopfunc as an object, and are not attempting to execute the function itself via loopfunc().

Now, argslist is meant to be a tuple containing arguments for all iterations of loopfunc, i.e. len(argslist) = len(fileslist). However, in our case, only the fileslist variable is iterated over, while other arguments are provided ‘as-is’. The workaround for this is to use list-comprehension (err… I mean tuple-comprehension) to generate a new variable (in our case argslist) that contains all relevant arguments for each function iteration.

In this way, the first process is created with loopfunc( (fileslist[0], arg1, arg2, arg3) ), the second process is created with loopfunc( (fileslist[1], arg1, arg2, arg3) ), and so on. Of course, within loopfunc(), we have already converted the input single argument into multiple arguments as we need.

Values return-ed from loopfunc() are stored in the variable results, which is looped over to print out each value. The fun behavior here is that each rs item is executed as that value becomes available, i.e. when each process completes. For example, if you’re running on a 4-core machine, output from the code can look like the following, depending upon the speed of execution of each iteration:

Start: fileslist[0]
Start: fileslist[1]
Start: fileslist[2]
Start: fileslist[3]
Status Update based on fileslist[0]
Status Update based on fileslist[1] 
Start: fileslist[4]
Start: fileslist[5]
Status Update based on fileslist[2] 
Start: fileslist[6]
Status Update based on fileslist[3] 
Start: fileslist[7]
...

Without any arguments, ProcessPoolExecutor() creates as many processes as there are cores on your computer. This is great if you want to run your code and walk away for a few hours, letting your Python script take over your whole computational capability. However, if you only want to allow a specific number of processes, you can use ProcessPoolExecutor(max_workers=nproc), where nproc is the number of processes you want to simultaneously allow at most.

To-do

In my current implementation I have used the above method to work on ‘chunks’ of data and then saved the resultant output with appropriate markers to disk. However, another way to implement parallel processing would be to take the output from each iteration, and save it as an element in an array, at the correct array index.

This should not be hard to do, all I should need is to return both the output data and the correct marker for the array index. I just haven’t done it (nor needed to do it) yet. I actually prefer saving the output from each chunk to disk separately, if possible, so that even if something crashes (or the power goes out, or whatever) and the process is interrupted, I won’t lose all progress made until then.