Python Concurrency: Multiprocessing, Threading, and Asyncio
Overview
Python offers three main approaches for concurrent execution: multiprocessing, threading, and asyncio. Each solves different problems and has distinct use cases.
The Fundamental Problem: The GIL
Python's Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecode simultaneously. This is crucial to understanding when to use each approach.
1. Multiprocessing
How It Works
- Creates separate Python processes, each with its own Python interpreter and memory space
- Each process has its own GIL
- True parallel execution on multiple CPU cores
- Processes communicate via IPC (Inter-Process Communication) mechanisms
Best For
- CPU-bound tasks: Heavy computations, data processing, mathematical operations
- Tasks that need true parallelism
- When you need to bypass the GIL
Pros
- True parallel execution
- Each process is isolated (crash in one doesn't affect others)
- Can utilize all CPU cores
Cons
- High memory overhead (each process duplicates memory)
- Slower startup time
- Communication between processes is expensive (serialization/deserialization)
- More complex to share data
Example Use Case
from multiprocessing import Pool
def heavy_computation(n):
return sum(i * i for i in range(n))
# Process 4 large computations in parallel
with Pool(4) as p:
results = p.map(heavy_computation, [10000000, 10000000, 10000000, 10000000])
2. Threading
How It Works
- Creates multiple threads within a single process
- All threads share the same memory space
- Only one thread executes Python bytecode at a time (due to GIL)
- Concurrency, not parallelism
Best For
- I/O-bound tasks: File operations, network requests, database queries
- When tasks spend time waiting (not computing)
- When you need shared memory access
Pros
- Lower memory overhead than multiprocessing
- Fast context switching
- Easy data sharing between threads
- Good for I/O-bound operations
Cons
- GIL prevents true parallel CPU execution
- Race conditions and deadlocks possible
- Not suitable for CPU-bound tasks
Example Use Case
from threading import Thread
import requests
def download_file(url):
response = requests.get(url)
# Process response
# Download multiple files concurrently
threads = [Thread(target=download_file, args=(url,)) for url in urls]
for t in threads:
t.start()
for t in threads:
t.join()
3. Asyncio
How It Works
- Single-threaded cooperative multitasking
- Uses an event loop to manage tasks
- Tasks voluntarily yield control (using
await
) - Non-blocking I/O operations
Best For
- High-concurrency I/O-bound tasks: Web servers, API clients, websockets
- When you have thousands of connections
- Modern async libraries (aiohttp, asyncpg)
Pros
- Extremely lightweight (can handle thousands of tasks)
- Lower overhead than threads
- Explicit control flow (you see where context switches with
await
) - No race conditions (single-threaded)
- Very efficient for I/O-bound work
Cons
- Requires async-compatible libraries
- One blocking call blocks everything
- Steeper learning curve
- Cannot use regular blocking code
Example Use Case
import asyncio
import aiohttp
async def fetch_url(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url) for url in urls]
results = await asyncio.gather(*tasks)
# Handle thousands of concurrent requests efficiently
asyncio.run(main())
Quick Decision Guide
Task Type | Best Choice | Why |
---|---|---|
CPU-intensive calculations | Multiprocessing | Bypasses GIL, true parallelism |
I/O operations (few connections) | Threading | Simple, good for moderate concurrency |
I/O operations (many connections) | Asyncio | Most efficient for high concurrency |
Mixed CPU + I/O | Multiprocessing + Asyncio | Process pool for CPU, async for I/O |
Performance Tips
1. Measure First
Always profile before optimizing. Use cProfile
, line_profiler
, or py-spy
.
2. Choose Based on Bottleneck
- CPU-bound? → Multiprocessing
- I/O-bound with moderate concurrency? → Threading
- I/O-bound with high concurrency? → Asyncio
3. Avoid Common Pitfalls
- Don't use multiprocessing for I/O-bound tasks (overhead outweighs benefits)
- Don't use threading for CPU-bound tasks (GIL makes it slower)
- Don't mix blocking code in asyncio (use
loop.run_in_executor()
if needed)
4. Optimal Worker Counts
- Multiprocessing: Usually
cpu_count()
orcpu_count() - 1
- Threading: Depends on wait time; often 2-10x cpu_count for I/O
- Asyncio: Can handle thousands of tasks on one thread
5. Hybrid Approaches
For complex applications, combine approaches:
# Process pool for CPU work, async for coordination
from concurrent.futures import ProcessPoolExecutor
import asyncio
async def process_data(data):
loop = asyncio.get_event_loop()
with ProcessPoolExecutor() as pool:
result = await loop.run_in_executor(pool, cpu_intensive_func, data)
return result
Memory and Overhead Comparison
Approach | Memory per Task | Startup Time | Context Switch Cost |
---|---|---|---|
Multiprocessing | ~10-50 MB | Slow (100ms+) | High |
Threading | ~8 MB | Medium (1ms) | Medium |
Asyncio | ~1-5 KB | Fast (<0.1ms) | Very Low |
Real-World Example: Web Scraping
# BAD: CPU-bound with multiprocessing
# Processing 100 URLs with heavy parsing
# Threading: ~30 seconds (I/O wait time reduced)
# Asyncio: ~5 seconds (optimal for I/O)
# Multiprocessing: ~60 seconds (overhead kills performance)
Key Takeaway
There's no "best" approach—only the right tool for your specific problem. Understanding your workload (CPU vs I/O bound) and concurrency needs (10 vs 10,000 tasks) is essential for making the right choice.