Python 3 application profiling tutorial
A comprehensive guide to profiling and optimizing Python applications.
Table of Contents
- Introduction to Profiling
- Built-in Profiling Tools
- Third-Party Profiling Tools
- Memory Profiling
- Line-by-Line Profiling
- Best Practices
Introduction to Profiling
Profiling is the process of measuring where your program spends time and uses resources. This helps identify bottlenecks and optimize performance.
Why Profile?
- Identify slow functions and code paths
- Optimize resource usage (CPU, memory)
- Make data-driven optimization decisions
- Avoid premature optimization
Types of Profiling
- Deterministic profiling: Measures all function calls (more accurate, higher overhead)
- Statistical profiling: Samples execution periodically (lower overhead, less precise)
- Memory profiling: Tracks memory allocation and usage
- Line profiling: Profiles code line-by-line
Built-in Profiling Tools
1. cProfile (Recommended)
cProfile
is Python's standard profiler with low overhead.
Basic Usage
import cProfile
def slow_function():
total = 0
for i in range(1000000):
total += i
return total
def fast_function():
return sum(range(1000000))
def main():
slow_function()
fast_function()
# Profile the main function
cProfile.run('main()')
Command Line Usage
# Profile a script
python3 -m cProfile my_script.py
# Save results to a file
python3 -m cProfile -o output.prof my_script.py
# Sort by cumulative time
python3 -m cProfile -s cumtime my_script.py
Programmatic Usage with Statistics
import cProfile
import pstats
from io import StringIO
def profile_function():
pr = cProfile.Profile()
pr.enable()
# Code to profile
result = sum(range(1000000))
pr.disable()
# Print statistics
s = StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
ps.print_stats()
print(s.getvalue())
profile_function()
Understanding cProfile Output
ncalls tottime percall cumtime percall filename:lineno(function)
1000 0.150 0.000 0.200 0.000 module.py:10(func)
- ncalls: Number of calls
- tottime: Total time spent in function (excluding subfunctions)
- percall: tottime/ncalls
- cumtime: Cumulative time (including subfunctions)
- percall: cumtime/ncalls
2. profile Module
Older, pure-Python profiler (slower than cProfile):
import profile
profile.run('main()')
3. timeit Module
For micro-benchmarking small code snippets:
import timeit
# Time a simple statement
time1 = timeit.timeit('sum(range(100))', number=10000)
print(f"sum(range(100)): {time1:.6f} seconds")
# Compare two approaches
time2 = timeit.timeit('[i for i in range(100)]', number=10000)
print(f"List comprehension: {time2:.6f} seconds")
# Using setup code
setup = "from math import sqrt"
stmt = "sqrt(144)"
time3 = timeit.timeit(stmt, setup=setup, number=10000)
print(f"sqrt(144): {time3:.6f} seconds")
Command Line Usage
# Time a statement
python3 -m timeit 'sum(range(100))'
# Specify number of runs
python3 -m timeit -n 1000 'sum(range(100))'
# With setup code
python3 -m timeit -s 'from math import sqrt' 'sqrt(144)'
Third-Party Profiling Tools
1. SnakeViz - Visual Profiler
Install:
pip3 install snakeviz
Usage:
import cProfile
# Generate profile
cProfile.run('main()', 'output.prof')
# View in browser (run in terminal)
# snakeviz output.prof
2. py-spy - Sampling Profiler
Low-overhead sampling profiler that doesn't require code changes.
Install:
pip3 install py-spy
Usage:
# Profile a running process
py-spy top --pid <PID>
# Record and generate flamegraph
py-spy record -o profile.svg -- python3 my_script.py
# Profile for specific duration
py-spy record -o profile.svg -d 30 -- python3 my_script.py
3. Yappi - Thread-Aware Profiler
Profiles multi-threaded applications:
Install:
pip3 install yappi
Usage:
import yappi
import threading
import time
def worker():
time.sleep(1)
total = sum(range(1000000))
# Start profiling
yappi.start()
# Run threads
threads = [threading.Thread(target=worker) for _ in range(5)]
for t in threads:
t.start()
for t in threads:
t.join()
# Stop and print stats
yappi.stop()
yappi.get_func_stats().print_all()
yappi.get_thread_stats().print_all()
Memory Profiling
1. memory_profiler
Line-by-line memory usage profiler.
Install:
pip3 install memory_profiler
Usage:
from memory_profiler import profile
@profile
def memory_intensive_function():
# Create large lists
large_list1 = [i for i in range(1000000)]
large_list2 = [i * 2 for i in range(1000000)]
# Some processing
result = [a + b for a, b in zip(large_list1, large_list2)]
return result
if __name__ == '__main__':
memory_intensive_function()
Command line:
# Run with memory profiler
python3 -m memory_profiler my_script.py
# Plot memory usage over time
mprof run my_script.py
mprof plot
2. tracemalloc (Built-in)
Python's built-in memory tracker:
import tracemalloc
# Start tracking
tracemalloc.start()
# Code to profile
large_list = [i for i in range(1000000)]
another_list = [i * 2 for i in range(500000)]
# Get current memory usage
current, peak = tracemalloc.get_traced_memory()
print(f"Current memory: {current / 1024 / 1024:.2f} MB")
print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")
# Get top memory allocations
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
print("\nTop 10 memory allocations:")
for stat in top_stats[:10]:
print(stat)
tracemalloc.stop()
3. objgraph - Object Reference Graphs
Visualize object relationships:
Install:
pip3 install objgraph
Usage:
import objgraph
class MyClass:
def __init__(self, value):
self.value = value
# Create objects
objects = [MyClass(i) for i in range(100)]
# Show most common types
objgraph.show_most_common_types()
# Count instances
print(f"MyClass instances: {objgraph.count('MyClass')}")
# Show growth between two points
objgraph.show_growth()
# ... do some work ...
objgraph.show_growth()
Line-by-Line Profiling
line_profiler
Install:
pip3 install line_profiler
Usage:
# my_script.py
@profile # This decorator is added by line_profiler
def slow_function():
total = 0
for i in range(1000):
for j in range(1000):
total += i * j
return total
@profile
def fast_function():
import numpy as np
arr = np.arange(1000)
return np.sum(arr[:, None] * arr[None, :])
if __name__ == '__main__':
slow_function()
fast_function()
Run:
# Profile the script
kernprof -l -v my_script.py
# Output shows time per line
Practical Examples
Example 1: Finding Bottlenecks
import cProfile
import pstats
def process_data():
# Simulate data processing
data = []
for i in range(10000):
data.append(i ** 2)
# Sort data
data.sort()
# Filter data
filtered = [x for x in data if x % 2 == 0]
return filtered
def analyze_bottlenecks():
profiler = cProfile.Profile()
profiler.enable()
result = process_data()
profiler.disable()
# Analyze results
stats = pstats.Stats(profiler)
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats(10) # Top 10 functions
if __name__ == '__main__':
analyze_bottlenecks()
Example 2: Comparing Algorithms
import timeit
import functools
def compare_implementations():
# Implementation 1: List comprehension
def impl1():
return [i * 2 for i in range(10000)]
# Implementation 2: Map function
def impl2():
return list(map(lambda x: x * 2, range(10000)))
# Implementation 3: Traditional loop
def impl3():
result = []
for i in range(10000):
result.append(i * 2)
return result
# Time each implementation
time1 = timeit.timeit(impl1, number=1000)
time2 = timeit.timeit(impl2, number=1000)
time3 = timeit.timeit(impl3, number=1000)
print(f"List comprehension: {time1:.6f} seconds")
print(f"Map function: {time2:.6f} seconds")
print(f"Traditional loop: {time3:.6f} seconds")
# Find fastest
fastest = min(time1, time2, time3)
print(f"\nFastest method is {fastest:.6f} seconds")
compare_implementations()
Example 3: Context Manager for Profiling
import cProfile
import pstats
from contextlib import contextmanager
@contextmanager
def profiled():
pr = cProfile.Profile()
pr.enable()
yield
pr.disable()
stats = pstats.Stats(pr)
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats(10)
# Usage
def main():
with profiled():
# Your code here
result = sum(i ** 2 for i in range(100000))
print(f"Result: {result}")
if __name__ == '__main__':
main()
Example 4: Decorator for Function Timing
import time
import functools
def timing_decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
end = time.perf_counter()
print(f"{func.__name__} took {end - start:.6f} seconds")
return result
return wrapper
@timing_decorator
def slow_computation():
time.sleep(1)
return sum(range(1000000))
@timing_decorator
def fast_computation():
return sum(range(1000))
# Usage
slow_computation()
fast_computation()
Best Practices
1. Profile Before Optimizing
Don't guess where the bottleneck is - measure it!
# Bad: Optimizing without profiling
def premature_optimization():
# Spending time optimizing the wrong thing
pass
# Good: Profile first, then optimize
def measured_optimization():
# 1. Profile the code
# 2. Identify actual bottlenecks
# 3. Optimize those specific areas
# 4. Re-profile to verify improvement
pass
2. Use Appropriate Tools
- cProfile: General-purpose profiling
- timeit: Micro-benchmarking
- memory_profiler: Memory issues
- line_profiler: Detailed line-by-line analysis
- py-spy: Production profiling without code changes
3. Profile in Realistic Conditions
# Bad: Profiling with toy data
small_data = list(range(100))
process(small_data)
# Good: Profile with realistic data
realistic_data = list(range(1000000))
process(realistic_data)
4. Consider Multiple Metrics
import time
import tracemalloc
def comprehensive_profile(func):
# Time measurement
start_time = time.perf_counter()
# Memory measurement
tracemalloc.start()
# Execute function
result = func()
# Get metrics
end_time = time.perf_counter()
current, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
print(f"Time: {end_time - start_time:.4f} seconds")
print(f"Memory (current): {current / 1024 / 1024:.2f} MB")
print(f"Memory (peak): {peak / 1024 / 1024:.2f} MB")
return result
5. Profile in Production (Carefully)
# Use py-spy for low-overhead production profiling
py-spy top --pid <PID>
# Or use yappi with sampling mode
6. Document Performance Requirements
def process_large_dataset(data):
```
Process a large dataset.
Performance requirements:
- Should handle 1M records in < 5 seconds
- Memory usage should stay under 500 MB
- CPU usage should not exceed 80%
Profiling results (last tested: 2025-10-04):
- Time: 3.2 seconds for 1M records
- Memory: 350 MB peak
```
pass
Quick Reference
Command Cheatsheet
# cProfile
python3 -m cProfile -s cumtime script.py
# timeit
python3 -m timeit 'sum(range(1000))'
# memory_profiler
python3 -m memory_profiler script.py
# line_profiler
kernprof -l -v script.py
# py-spy
py-spy record -o profile.svg -- python3 script.py
# mprof (memory over time)
mprof run script.py
mprof plot
Common Profiling Pattern
import cProfile
import pstats
import io
def profile_code():
pr = cProfile.Profile()
pr.enable()
# YOUR CODE HERE
pr.disable()
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
ps.print_stats(20)
print(s.getvalue())
Additional Resources
Conclusion
Profiling is essential for writing efficient Python code. Remember:
- Measure first - Don't optimize blindly
- Use the right tool - Different tools for different problems
- Profile realistically - Use real-world data and conditions
- Iterate - Profile, optimize, re-profile, repeat
- Document - Keep records of performance improvements
Happy profiling! 🚀