- Asynchronous Non-blocking I/O Java Echo Server
- Fixing My Asynchronous Non-blocking I/O Callback Hell With Monads
- Scala Asynchronous IO Echo Server
- Reactive Actors
Coroutines in Python are similar to generators, but coroutines (a function definition using async def) can control where execution continues after the yield (replaced by await). You await a another coroutine or a future. The coroutine approach can replace callbacks.
If we check the time it takes for the program to run, it's 3 seconds (the longest slow operation) and not 6 seconds (the sum of all the slow operations).
Slow operation sleep 1 complete
Slow operation sleep 2 complete
Slow operation sleep 3 complete
Completed in 3.02 seconds
It's similar if we want to get results from each task. We're even able to get them as they are available.
Slow operation sleep 1 complete
Got result 1
Slow operation sleep 2 complete
Got result 2
Slow operation sleep 3 complete
Got result 3
Completed in 3.00 seconds
With Python's GIL you've never really been able to run multiple threads in parallel. You've had to run concurrent code in multiple processes to leverage multiple CPU cores. With asyncio we are able to at least make single process IO-bound tasks execute faster because it switches between tasks to bypass GIL contention.
For CPU-bound code you still have to use multiple processes to parallelize your code. Python's parallel API limits your ability to use results from these tasks as they become available, as we did in the example above. Waiting for a process to finish and getting a future result both block. asyncio gives us a way to unify the concurrent and parallel APIs.
Slow operation sum 10000000 complete
Got result 49999995000000
Slow operation sum 20000000 complete
Got result 199999990000000
Slow operation sum 30000000 complete
Got result 449999985000000
Completed in 2.91 seconds
There is some overhead in creating the processes but a single 3 second task is only slightly faster than parallel 1, 2, and 3 second tasks.
Slow operation sum 30000000 complete
Got result 449999985000000
Completed in 2.66 seconds
This hints at another use. Given that asyncio uses a single thread, if there are too many IO-bound tasks or if any of them consumes too much CPU, we can overwhelm it. For such a situation it is possible to create multiple processes, each with its own asyncio event loop.
1 process and 3 tasks:
Got 3 results in 6.20 seconds
1 process and 15 tasks:
Got 15 results in 23.94 seconds
The simulated mix of IO-bound and CPU-bound code is interesting. It's slower than just the CPU-bound code on it's own, but faster than the sum of the IO-bound and CPU-bound code. We see some asyncio benefits up to around 15 tasks and then it levels off.
1 process and 30 tasks:
Got 30 results in 46.58 seconds
4 processes and 15*4 tasks:
Got 60 results in 34.11 seconds
8 processes and 15*8 tasks:
Got 120 results in 52.81 seconds
We similarly see some benefit of parallelizing the code up the number of CPU cores I have, then the time is increasing linearly as we would expect.
16 processes and 15*16 tasks:
Got 120 results in 105.98 seconds
If you are looking for more details the comprehensive Python Concurrency series of articles has additional examples like this and goes into more detail on some of the underlying concepts.