Machine & Deep Learning

Reinforcement Learning Explained Simply (with Diagrams and Real Code)

Reinforcement Learning: the essentials in one article — real code, diagrams and concrete steps, excerpts from a 45-lesson course.

REHOUMA Haythem

12 Jun 2026 • 9 min read

A guide that gets straight to the point: Reinforcement Learning broken down with diagrams, concrete examples, and tested commands. Everything comes from a structured 16-chapter course — here are the best parts.

tl;dr

Introduction and Installation
Fundamentals of Reinforcement Learning
Markov Decision Process
Classic Q-Learning
Dynamic Programming

~$ cat ./parcours.md # Reinforcement Learning — 15 chapters

Introduction and Installation

→ Course presentation and why RL?→ Install Python, Gymnasium and PyTorch+ 1 more lessons

Fundamentals of Reinforcement Learning

→ RL vs supervised vs unsupervised→ Agent, environment, state, action, reward+ 2 more lessons

Markov Decision Process

→ Markov Decision Process (MDP) explained→ Intuitive Bellman Equation+ 2 more lessons

Classical Q-Learning

→ Q-Learning: intuition and formula→ Implement a Q-table in Python+ 2 more lessons

Dynamic Programming

SARSA and Variants

→ SARSA: the on-policy cousin of Q-Learning→ Q-Learning vs SARSA: when to use what+ 2 more lessons

Deep Q-Networks

→ Why a neural network?→ DQN architecture with PyTorch+ 2 more lessons

Monte Carlo Methods

→ Monte Carlo Prediction→ Monte Carlo Control+ 1 more lessons

🏁

Final project (+ 7 chapters along the way)

→ You leave with a concrete and demonstrable project

First Gymnasium Environment (FrozenLake)

NOTEObjective — Hands-on practice with your first Gymnasium environment. Understand the universal reset / step interface, the concept of an episode, and run a random agent on FrozenLake.

Learning Objectives

TIPBy the end of this module

Create an environment with gym.make
Understand the reset and step methods
Identify the observation space and action space
Write a complete episode loop
Run a random agent and observe its score

FrozenLake: The Playground

FrozenLake is a 4x4 grid representing a frozen lake. The agent starts at the starting cell (S), must reach the gift (G) by walking on solid ice (F), while avoiding holes (H) where it falls and loses. Simple, visual, perfect for beginners.

NOTENote: By default, FrozenLake is “slippery” (is_slippery=True). The agent that tries to go right may slip and end up elsewhere. This adds randomness and makes the problem more interesting. It can be disabled for beginners.

The Universal Interface: reset and step

All Gymnasium environments share the same interface, which makes writing agents so convenient. Two methods are sufficient.

reset()

Resets the environment to its initial state and returns the first state. It is called at the beginning of each episode (a complete game).

step(action)

Executes an action and returns five values: the new state, the reward, whether the episode is terminated, whether it is truncated, and info.

WARNINGWarning: Always remember to call env.close() at the end, especially with graphical rendering. Otherwise, ghost windows may remain open and consume memory.

Visualizing the Game

To watch the agent play on screen, add the render mode:

Solving FrozenLake with Value Iteration

NOTEObjective — Apply Value Iteration on FrozenLake end-to-end: compute V*, extract the optimal policy, run the agent, and measure its win rate. Your first agent that truly wins.

Learning Objectives

TIPBy the end of this module

Write complete Value Iteration for FrozenLake
Extract the optimal policy from V*
Evaluate the agent over many episodes
Interpret the obtained values and policy
Understand the effect of slippery ice

Step 1: Access the FrozenLake Model

FrozenLake provides its full model via env.unwrapped.P. It is a dictionary that gives, for each state and each action, the list of possible transitions.

DQN Architecture with PyTorch

NOTEObjective — Build the DQN neural network with PyTorch. Understand the architecture (input, hidden layers, output), the role of activation functions, and how the network predicts Q-values.

Learning Objectives

TIPBy the end of this module

Define a network with nn.Module
Choose input and output sizes according to the environment
Understand the role of hidden layers and ReLU
Make a prediction (forward pass)
Understand that the output gives one Q-value per action

The Anatomy of a DQN

A DQN for CartPole is a very simple network: it takes the 4 state numbers, passes them through two hidden layers, and produces 2 Q-values (one per action: left, right).

go-further

This article covers the most useful excerpts — the full Reinforcement Learning course (16 chapters, 45 lessons, corrected exercises, and final project) takes you all the way.

./access-the-full-course free course: Mastering Claude Code

FAQ

How long does it take to learn Reinforcement Learning?

With a structured progression (16 chapters, 45 short and practical lessons), you reach an operational level in a few weeks at 30 to 60 minutes per day. The key is to practice each concept immediately.

Are there any prerequisites?

It is best to be comfortable with the fundamentals of the field: this content goes in depth, with real-world cases.

Where to start concretely?

Reproduce the commands from this article, then follow the full Reinforcement Learning course: it chains the 45 lessons in order, with exercises and a final project.

./further-reading

→ Get started with Machine Learning for Beginners: your first concrete step today → Machine Learning Simplified in practice: the code and commands that really matter → Python Machine Learning: the 9 key steps to go from zero to operational

📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.

First Gymnasium Environment (FrozenLake)

Learning Objectives

FrozenLake: The Playground

The Universal Interface: reset and step

reset()

step(action)

Visualizing the Game

Solving FrozenLake with Value Iteration

Learning Objectives

Step 1: Access the FrozenLake Model

DQN Architecture with PyTorch

Learning Objectives

The Anatomy of a DQN

FAQ

Stay up to date