Sunday, August 13, 2017

Python 3 asyncio

A while back I wrote a few posts about asynchronous programming:
So when I learned that Python 3.5 has added async and await operations I knew I had to check it out to see how it compares. asyncio describes itself as "infrastructure for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives".

Coroutines in Python are similar to generators, but coroutines (a function definition using async def) can control where execution continues after the yield (replaced by await). You await a another coroutine or a future. The coroutine approach can replace callbacks.

If we check the time it takes for the program to run, it's 3 seconds (the longest slow operation) and not 6 seconds (the sum of all the slow operations).

Slow operation sleep 1 complete
Slow operation sleep 2 complete
Slow operation sleep 3 complete
Completed in 3.02 seconds

It's similar if we want to get results from each task. We're even able to get them as they are available. 

Slow operation sleep 1 complete
Got result 1
Slow operation sleep 2 complete
Got result 2
Slow operation sleep 3 complete
Got result 3
Completed in 3.00 seconds

With Python's GIL you've never really been able to run multiple threads in parallel. You've had to run concurrent code in multiple processes to leverage multiple CPU cores. With asyncio we are able to at least make single process IO-bound tasks execute faster because it switches between tasks to bypass GIL contention.

For CPU-bound code you still have to use multiple processes to parallelize your code. Python's parallel API limits your ability to use results from these tasks as they become available, as we did in the example above. Waiting for a process to finish and getting a future result both block. asyncio gives us a way to unify the concurrent and parallel APIs.

Slow operation sum 10000000 complete
Got result 49999995000000
Slow operation sum 20000000 complete
Got result 199999990000000
Slow operation sum 30000000 complete
Got result 449999985000000
Completed in 2.91 seconds

There is some overhead in creating the processes but a single 3 second task is only slightly faster than parallel 1, 2, and 3 second tasks.

Slow operation sum 30000000 complete
Got result 449999985000000
Completed in 2.66 seconds

This hints at another use. Given that asyncio uses a single thread, if there are too many IO-bound tasks or if any of them consumes too much CPU, we can overwhelm it. For such a situation it is possible to create multiple processes, each with its own asyncio event loop.

1 process and 3 tasks:
Got 3 results in 6.20 seconds

1 process and 15 tasks:
Got 15 results in 23.94 seconds

The simulated mix of IO-bound and CPU-bound code is interesting. It's slower than just the CPU-bound code on it's own, but faster than the sum of the IO-bound and CPU-bound code. We see some asyncio benefits up to around 15 tasks and then it levels off.

1 process and 30 tasks:
Got 30 results in 46.58 seconds

4 processes and 15*4 tasks:
Got 60 results in 34.11 seconds

8 processes and 15*8 tasks:
Got 120 results in 52.81 seconds

We similarly see some benefit of parallelizing the code up the number of CPU cores I have, then the time is increasing linearly as we would expect.

16 processes and 15*16 tasks:
Got 120 results in 105.98 seconds

If you are looking for more details the comprehensive Python Concurrency series of articles has additional examples like this and goes into more detail on some of the underlying concepts.