Python 3 application profiling tutorial

A comprehensive guide to profiling and optimizing Python applications.

Table of Contents

  1. Introduction to Profiling
  2. Built-in Profiling Tools
  3. Third-Party Profiling Tools
  4. Memory Profiling
  5. Line-by-Line Profiling
  6. Best Practices

Introduction to Profiling

Profiling is the process of measuring where your program spends time and uses resources. This helps identify bottlenecks and optimize performance.

Why Profile?

  • Identify slow functions and code paths
  • Optimize resource usage (CPU, memory)
  • Make data-driven optimization decisions
  • Avoid premature optimization

Types of Profiling

  • Deterministic profiling: Measures all function calls (more accurate, higher overhead)
  • Statistical profiling: Samples execution periodically (lower overhead, less precise)
  • Memory profiling: Tracks memory allocation and usage
  • Line profiling: Profiles code line-by-line

Built-in Profiling Tools

cProfile is Python's standard profiler with low overhead.

Basic Usage

import cProfile

def slow_function():
    total = 0
    for i in range(1000000):
        total += i
    return total

def fast_function():
    return sum(range(1000000))

def main():
    slow_function()
    fast_function()

# Profile the main function
cProfile.run('main()')

Command Line Usage

# Profile a script
python3 -m cProfile my_script.py

# Save results to a file
python3 -m cProfile -o output.prof my_script.py

# Sort by cumulative time
python3 -m cProfile -s cumtime my_script.py

Programmatic Usage with Statistics

import cProfile
import pstats
from io import StringIO

def profile_function():
    pr = cProfile.Profile()
    pr.enable()
    
    # Code to profile
    result = sum(range(1000000))
    
    pr.disable()
    
    # Print statistics
    s = StringIO()
    ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
    ps.print_stats()
    print(s.getvalue())

profile_function()

Understanding cProfile Output

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    0.150    0.000    0.200    0.000 module.py:10(func)
  • ncalls: Number of calls
  • tottime: Total time spent in function (excluding subfunctions)
  • percall: tottime/ncalls
  • cumtime: Cumulative time (including subfunctions)
  • percall: cumtime/ncalls

2. profile Module

Older, pure-Python profiler (slower than cProfile):

import profile

profile.run('main()')

3. timeit Module

For micro-benchmarking small code snippets:

import timeit

# Time a simple statement
time1 = timeit.timeit('sum(range(100))', number=10000)
print(f"sum(range(100)): {time1:.6f} seconds")

# Compare two approaches
time2 = timeit.timeit('[i for i in range(100)]', number=10000)
print(f"List comprehension: {time2:.6f} seconds")

# Using setup code
setup = "from math import sqrt"
stmt = "sqrt(144)"
time3 = timeit.timeit(stmt, setup=setup, number=10000)
print(f"sqrt(144): {time3:.6f} seconds")

Command Line Usage

# Time a statement
python3 -m timeit 'sum(range(100))'

# Specify number of runs
python3 -m timeit -n 1000 'sum(range(100))'

# With setup code
python3 -m timeit -s 'from math import sqrt' 'sqrt(144)'

Third-Party Profiling Tools

1. SnakeViz - Visual Profiler

Install:

pip3 install snakeviz

Usage:

import cProfile

# Generate profile
cProfile.run('main()', 'output.prof')

# View in browser (run in terminal)
# snakeviz output.prof

2. py-spy - Sampling Profiler

Low-overhead sampling profiler that doesn't require code changes.

Install:

pip3 install py-spy

Usage:

# Profile a running process
py-spy top --pid <PID>

# Record and generate flamegraph
py-spy record -o profile.svg -- python3 my_script.py

# Profile for specific duration
py-spy record -o profile.svg -d 30 -- python3 my_script.py

3. Yappi - Thread-Aware Profiler

Profiles multi-threaded applications:

Install:

pip3 install yappi

Usage:

import yappi
import threading
import time

def worker():
    time.sleep(1)
    total = sum(range(1000000))

# Start profiling
yappi.start()

# Run threads
threads = [threading.Thread(target=worker) for _ in range(5)]
for t in threads:
    t.start()
for t in threads:
    t.join()

# Stop and print stats
yappi.stop()
yappi.get_func_stats().print_all()
yappi.get_thread_stats().print_all()

Memory Profiling

1. memory_profiler

Line-by-line memory usage profiler.

Install:

pip3 install memory_profiler

Usage:

from memory_profiler import profile

@profile
def memory_intensive_function():
    # Create large lists
    large_list1 = [i for i in range(1000000)]
    large_list2 = [i * 2 for i in range(1000000)]
    
    # Some processing
    result = [a + b for a, b in zip(large_list1, large_list2)]
    return result

if __name__ == '__main__':
    memory_intensive_function()

Command line:

# Run with memory profiler
python3 -m memory_profiler my_script.py

# Plot memory usage over time
mprof run my_script.py
mprof plot

2. tracemalloc (Built-in)

Python's built-in memory tracker:

import tracemalloc

# Start tracking
tracemalloc.start()

# Code to profile
large_list = [i for i in range(1000000)]
another_list = [i * 2 for i in range(500000)]

# Get current memory usage
current, peak = tracemalloc.get_traced_memory()
print(f"Current memory: {current / 1024 / 1024:.2f} MB")
print(f"Peak memory: {peak / 1024 / 1024:.2f} MB")

# Get top memory allocations
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("\nTop 10 memory allocations:")
for stat in top_stats[:10]:
    print(stat)

tracemalloc.stop()

3. objgraph - Object Reference Graphs

Visualize object relationships:

Install:

pip3 install objgraph

Usage:

import objgraph

class MyClass:
    def __init__(self, value):
        self.value = value

# Create objects
objects = [MyClass(i) for i in range(100)]

# Show most common types
objgraph.show_most_common_types()

# Count instances
print(f"MyClass instances: {objgraph.count('MyClass')}")

# Show growth between two points
objgraph.show_growth()
# ... do some work ...
objgraph.show_growth()

Line-by-Line Profiling

line_profiler

Install:

pip3 install line_profiler

Usage:

# my_script.py
@profile  # This decorator is added by line_profiler
def slow_function():
    total = 0
    for i in range(1000):
        for j in range(1000):
            total += i * j
    return total

@profile
def fast_function():
    import numpy as np
    arr = np.arange(1000)
    return np.sum(arr[:, None] * arr[None, :])

if __name__ == '__main__':
    slow_function()
    fast_function()

Run:

# Profile the script
kernprof -l -v my_script.py

# Output shows time per line

Practical Examples

Example 1: Finding Bottlenecks

import cProfile
import pstats

def process_data():
    # Simulate data processing
    data = []
    for i in range(10000):
        data.append(i ** 2)
    
    # Sort data
    data.sort()
    
    # Filter data
    filtered = [x for x in data if x % 2 == 0]
    
    return filtered

def analyze_bottlenecks():
    profiler = cProfile.Profile()
    profiler.enable()
    
    result = process_data()
    
    profiler.disable()
    
    # Analyze results
    stats = pstats.Stats(profiler)
    stats.strip_dirs()
    stats.sort_stats('cumulative')
    stats.print_stats(10)  # Top 10 functions

if __name__ == '__main__':
    analyze_bottlenecks()

Example 2: Comparing Algorithms

import timeit
import functools

def compare_implementations():
    # Implementation 1: List comprehension
    def impl1():
        return [i * 2 for i in range(10000)]
    
    # Implementation 2: Map function
    def impl2():
        return list(map(lambda x: x * 2, range(10000)))
    
    # Implementation 3: Traditional loop
    def impl3():
        result = []
        for i in range(10000):
            result.append(i * 2)
        return result
    
    # Time each implementation
    time1 = timeit.timeit(impl1, number=1000)
    time2 = timeit.timeit(impl2, number=1000)
    time3 = timeit.timeit(impl3, number=1000)
    
    print(f"List comprehension: {time1:.6f} seconds")
    print(f"Map function: {time2:.6f} seconds")
    print(f"Traditional loop: {time3:.6f} seconds")
    
    # Find fastest
    fastest = min(time1, time2, time3)
    print(f"\nFastest method is {fastest:.6f} seconds")

compare_implementations()

Example 3: Context Manager for Profiling

import cProfile
import pstats
from contextlib import contextmanager

@contextmanager
def profiled():
    pr = cProfile.Profile()
    pr.enable()
    yield
    pr.disable()
    
    stats = pstats.Stats(pr)
    stats.strip_dirs()
    stats.sort_stats('cumulative')
    stats.print_stats(10)

# Usage
def main():
    with profiled():
        # Your code here
        result = sum(i ** 2 for i in range(100000))
        print(f"Result: {result}")

if __name__ == '__main__':
    main()

Example 4: Decorator for Function Timing

import time
import functools

def timing_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        end = time.perf_counter()
        print(f"{func.__name__} took {end - start:.6f} seconds")
        return result
    return wrapper

@timing_decorator
def slow_computation():
    time.sleep(1)
    return sum(range(1000000))

@timing_decorator
def fast_computation():
    return sum(range(1000))

# Usage
slow_computation()
fast_computation()

Best Practices

1. Profile Before Optimizing

Don't guess where the bottleneck is - measure it!

# Bad: Optimizing without profiling
def premature_optimization():
    # Spending time optimizing the wrong thing
    pass

# Good: Profile first, then optimize
def measured_optimization():
    # 1. Profile the code
    # 2. Identify actual bottlenecks
    # 3. Optimize those specific areas
    # 4. Re-profile to verify improvement
    pass

2. Use Appropriate Tools

  • cProfile: General-purpose profiling
  • timeit: Micro-benchmarking
  • memory_profiler: Memory issues
  • line_profiler: Detailed line-by-line analysis
  • py-spy: Production profiling without code changes

3. Profile in Realistic Conditions

# Bad: Profiling with toy data
small_data = list(range(100))
process(small_data)

# Good: Profile with realistic data
realistic_data = list(range(1000000))
process(realistic_data)

4. Consider Multiple Metrics

import time
import tracemalloc

def comprehensive_profile(func):
    # Time measurement
    start_time = time.perf_counter()
    
    # Memory measurement
    tracemalloc.start()
    
    # Execute function
    result = func()
    
    # Get metrics
    end_time = time.perf_counter()
    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    
    print(f"Time: {end_time - start_time:.4f} seconds")
    print(f"Memory (current): {current / 1024 / 1024:.2f} MB")
    print(f"Memory (peak): {peak / 1024 / 1024:.2f} MB")
    
    return result

5. Profile in Production (Carefully)

# Use py-spy for low-overhead production profiling
py-spy top --pid <PID>

# Or use yappi with sampling mode

6. Document Performance Requirements

def process_large_dataset(data):
    ```
    Process a large dataset.
    
    Performance requirements:
    - Should handle 1M records in < 5 seconds
    - Memory usage should stay under 500 MB
    - CPU usage should not exceed 80%
    
    Profiling results (last tested: 2025-10-04):
    - Time: 3.2 seconds for 1M records
    - Memory: 350 MB peak
    ```
    pass

Quick Reference

Command Cheatsheet

# cProfile
python3 -m cProfile -s cumtime script.py

# timeit
python3 -m timeit 'sum(range(1000))'

# memory_profiler
python3 -m memory_profiler script.py

# line_profiler
kernprof -l -v script.py

# py-spy
py-spy record -o profile.svg -- python3 script.py

# mprof (memory over time)
mprof run script.py
mprof plot

Common Profiling Pattern

import cProfile
import pstats
import io

def profile_code():
    pr = cProfile.Profile()
    pr.enable()
    
    # YOUR CODE HERE
    
    pr.disable()
    
    s = io.StringIO()
    ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
    ps.print_stats(20)
    print(s.getvalue())

Additional Resources


Conclusion

Profiling is essential for writing efficient Python code. Remember:

  1. Measure first - Don't optimize blindly
  2. Use the right tool - Different tools for different problems
  3. Profile realistically - Use real-world data and conditions
  4. Iterate - Profile, optimize, re-profile, repeat
  5. Document - Keep records of performance improvements

Happy profiling! 🚀