Python

Advanced Python Performance Explained Simply (with Diagrams and Real Code)

Advanced Python Performance: The Essentials in One Article — Real Code, Diagrams and Concrete Steps, Excerpts from a 35-Lesson Course.

REHOUMA Haythem

12 Jun 2026 • 13 min read

A guide that gets straight to the point: Advanced Python Performance dissected with diagrams, concrete examples and tested commands. Everything comes from a structured 11-chapter course — here are the highlights.

tl;dr

Introduction and Installation
Code Profiling
Comprehensions iterators generators
Multithreading vs Multiprocessing
asyncio and coroutines

~$ cat ./parcours.md # Python Advanced Performance — 10 chapters

Introduction and Installation

→ Course presentation→ Install the profiling tools+ 1 more lessons

Code profiling

→ cProfile and pstats→ timeit for precise measurements+ 1 more lessons

Comprehensions iterators generators

→ List, set and dict comprehensions→ The iterator protocol+ 1 more lessons

Multithreading vs Multiprocessing

→ Threading, the GIL and its limits→ Multiprocessing, true parallelism+ 1 more lessons

asyncio and coroutines

→ Introduction to asyncio→ async / await in practice+ 1 more lessons

Cython Numba and vectorization

→ Vectorization with NumPy→ Numba JIT+ 1 more lessons

Memory management

→ The Python garbage collector→ Weak references with weakref+ 1 more lessons

Caching and memoization

→ functools.lru_cache→ Disk cache with joblib and diskcache+ 1 more lessons

🏁

Final project (+ 2 chapters along the way)

→ You leave with a concrete and demonstrable project

Course Overview

NOTEObjective — Understand what it means to “optimize Python code”, when to do it (and especially when not to), and get a clear picture of the journey ahead in this course.

Learning Objectives

TIPBy the end of this module — You will be able to explain why Python is reputed to be slow (and why that is partly a myth), you will know the 3 main optimization axes, and you will understand the golden rule: measure before optimizing.

The Concrete Problem

Have you ever experienced this? You launch your Python script at 5 pm to compute a report, you go grab a coffee, and when you come back 30 minutes later the script is still running. Worse: your colleague who does the same thing in R or Julia finished in 3 minutes.

Typical Symptoms of a Slow Python Program

What You Will Learn to Do

NOTEIs Python really slow? — Pure Python (CPython) can be 50 to 100× slower than C for numerical loops. However, most scientific libraries (NumPy, Pandas, scikit-learn) are written in C or Fortran. Used correctly, Python reaches 80 to 95 % of C performance. The problem is almost never “Python is slow” but “my Python code is poorly written”.

The 3 Main Optimization Axes

Axis	Question asked	Typical gain
1. Algorithm	Is my code O(n²) when it could be O(n log n)?	10× to 10,000×
2. Data Structure	Am I using a `list` when a `set` would do?	10× to 1000×
3. Concurrency / Parallelism	Can I run these 8 tasks at the same time?	2× to 16× (depending on CPU/IO)

WARNINGOrder of attack — Always in this order: algorithm > structure > parallelism. Parallelizing a bad algorithm is like putting 8 people to dig a tunnel with teaspoons when an excavator would have sufficed.

The 80/20 Rule (Pareto)

In 80 % of programs, 80 % of execution time is spent in 20 % of the code. Often it is even 90/10 or 95/5.

TIPConsequence — No need to optimize all your code. Find the 5 lines that cost 95 % of the time, optimize them aggressively, and leave the rest alone.

Donald Knuth, the computing legend, summarized it in 1974:

NOTE« Premature optimization is the root of all evil. »
(Premature optimization is the root of all evil.)

Practical translation: first write clear and correct code. Enjoy it. If it is too slow, optimize only the hot spots. Otherwise you waste time making unreadable code that did not affect anyone.

A Telling Before/After

Here is a real example: summing the squares of numbers from 0 to 10 million.

output

# Naive version: Python loop
total = 0
for i in range(10_000_000):
    total += i * i
# Time: ~1.2 seconds on a modern laptop

# Vectorized version with NumPy
import numpy as np
arr = np.arange(10_000_000)
total = (arr * arr).sum()
# Time: ~0.04 seconds -> 30× faster

# Compiled version with Numba @jit
from numba import jit
@jit(nopython=True)
def somme_carres(n):
    total = 0
    for i in range(n):
        total += i * i
    return total
# Time: ~0.01 seconds -> 120× faster

Same problem, same language, same mathematical result — but 120 times faster. This is exactly what you will learn to do, systematically, on your own code.

What You Will Build

Phase 1: Measure (ch. 0-1)

Install the tools, run your first profile, learn to read a cProfile report. You will be able to identify the bottleneck in under 5 minutes.

Phase 2: Optimize (ch. 2-7)

Generators, threading, multiprocessing, asyncio, NumPy, Numba, Cython, caching. The entire modern Python developer toolkit.

Generators with yield

NOTEObjective — Discover yield, the keyword that turns a function into a generator, and learn to build lazy data-processing pipelines capable of handling multi-gigabyte files with only a few megabytes of RAM.

Learning Objectives

TIPBy the end of this module — You will know how to write a generator, use it in a processing pipeline, and intelligently choose between list, generator and materialized collection.

A First Generator

output

def compter(max):
    n = 0
    while n < max:
        yield n     # suspend the function and return n
        n += 1

# Call: does NOTHING, we get a generator back
g = compter(5)
print(type(g))   # <class 'generator'>

# Consumption
for n in g:
    print(n)     # 0 1 2 3 4

NOTEMagic — As soon as a function contains a yield, Python turns it into a generator factory. The call does not trigger the code: it returns a generator object. The code only executes on each call to next().

yield vs return

return	yield
Terminates the function	Suspends the function
State is lost	State is preserved
Returns a value (once)	Can be called multiple times
Returns everything at once	Returns one element at a time

Real-World Use Case: Reading a Large Log File

output

def lire_log(chemin):
    """Generator that yields each line without loading everything."""
    with open(chemin, encoding="utf-8") as f:
        for ligne in f:
            yield ligne.rstrip("\n")

def filtrer_erreurs(lignes):
    """Generator that keeps only ERROR lines."""
    for ligne in lignes:
        if "ERROR" in ligne:
            yield ligne

def extraire_codes(lignes):
    """Generator that yields the HTTP code of each line."""
    for ligne in lignes:
        try:
            code = int(ligne.split()[-1])
            yield code
        except (ValueError, IndexError):
            continue

# Pipeline: none of the steps consume RAM, even for 50 GB
lignes = lire_log("acces.log")
erreurs = filtrer_erreurs(lignes)
codes = extraire_codes(erreurs)

# Final consumption
from collections import Counter
print(Counter(codes).most_common(5))
# [(500, 1284), (502, 412), (503, 309), ...]

TIPThe art of the pipeline — Each generator does one thing and passes the result to the next. This is the Unix tools model: cat file | grep ERROR | awk '{print $NF}' | sort | uniq -c. Readable, modular, constant memory.

yield from: delegate to another generator

output

def sous_compter(a, b):
    for i in range(a, b):
        yield i

def compter_tout():
    yield from sous_compter(0, 3)     # 0,1,2
    yield from sous_compter(10, 13)   # 10,11,12
    yield 99

print(list(compter_tout()))
# [0, 1, 2, 10, 11, 12, 99]

yield from avoids the for x in autre: yield x loop and also correctly handles exceptions and sent values.

send(): bidirectional generators

You can send values into a generator (rarely used but powerful).

output

def echo():
    while True:
        recu = yield
        print("Received:", recu)

g = echo()
next(g)        # start the generator
g.send("hello")
g.send("world")
# Prints: Received: hello / Received: world

This mechanism is at the origin of asyncio before Python 3.5. Today we prefer async/await.

Pitfall #1: a generator can only be iterated once

output

g = (i*i for i in range(5))
print(list(g))   # [0, 1, 4, 9, 16]
print(list(g))   # []  -- WARNING, g is exhausted

Solution: recreate the generator, or materialize it into a list if you need it multiple times:

output

data = [i*i for i in range(5)]   # list, reusable

Profiler and Find the Bottlenecks

NOTEObjective — Apply the method from chapter 1 to our ETL pipeline: use cProfile to see where time is spent, snakeviz to visualize, and line_profiler to zoom in on the critical function.

Learning Objectives

TIPBy the end of this module — You will know how to profile a complete production script, read the report, identify the 3-4 lines that consume 90 % of the time, and write a clear “diagnostic report” for your team.

1. Global cProfile

output

python -m cProfile -o pipeline.prof pipeline_v0.py

To explore interactively with pstats:

output

python -m pstats pipeline.prof
% sort cumulative
% stats 15

output

42_847_310 function calls in 1083.42 seconds

ncalls    tottime  cumtime  filename:lineno(function)
     1     0.000  1083.42  pipeline_v0.py:1(<module>)
     1     0.005  1083.41  pipeline_v0.py:42(main)
     1   654.21   750.18  pipeline_v0.py:11(traiter_transactions)
5000001    34.20   34.20  <built-in method strip>
5000001    28.45   28.45  <built-in method upper>
3750000    21.89   45.30  pipeline_v0.py:18(traiter_transactions/dict.get)
5000001    18.95   18.95  <built-in method float>
3750000   180.45   180.45  list.append (resultats)
     1   268.32   268.32  pipeline_v0.py:32(agreger)
     1    65.10    65.10  pipeline_v0.py:39(sauver)

NOTEReading — traiter_transactions = 70 % of the time. Inside: strip/upper (60 s), append (180 s), float() (19 s). agreger = 25 %. sauver = 6 %. So priority #1 = traiter_transactions.

2. Visualize with snakeviz

output

snakeviz pipeline.prof

A browser opens. “Sunburst” view: a central circle representing the entire program, divided into sectors proportional to time. Click to zoom.

On our profile we immediately see:

3. Zoom with line_profiler

Decorate the critical function:

output

@profile
def traiter_transactions(produits):
    ...

output

kernprof -l -v pipeline_v0.py

go-further

This article covers the most useful excerpts — the complete Advanced Python Performance course (11 chapters, 35 lessons, corrected exercises and final project) takes you all the way.

./access-the-full-course free course: Mastering Claude Code

FAQ

How long does it take to learn Advanced Python Performance?

With a structured progression (11 chapters, 35 short practical lessons), you reach an operational level in a few weeks at 30 to 60 minutes per day. The key is to practice each concept immediately.

Are there any prerequisites?

It is best to be comfortable with the fundamentals of the domain: this content goes in depth, with real-world cases.

Where to start concretely?

Reproduce the commands in this article, then follow the complete Advanced Python Performance course: it chains the 35 lessons in order, with exercises and a final project.

./also-read

→ Python Zero to Hero in practice: the code and commands that really matter → Get started with Python AI Fundamentals: your first concrete step today → Python Intermediate OOP in practice: the code and commands that really matter

📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.

Course Overview

Learning Objectives

The Concrete Problem

Typical Symptoms of a Slow Python Program

What You Will Learn to Do

The 3 Main Optimization Axes

The 80/20 Rule (Pareto)

A Telling Before/After

What You Will Build

Phase 1: Measure (ch. 0-1)

Phase 2: Optimize (ch. 2-7)

Generators with yield

Learning Objectives

A First Generator

yield vs return

Real-World Use Case: Reading a Large Log File

yield from: delegate to another generator

send(): bidirectional generators

Pitfall #1: a generator can only be iterated once

Profiler and Find the Bottlenecks

Learning Objectives

1. Global cProfile

2. Visualize with snakeviz

3. Zoom with line_profiler

FAQ

Stay up to date