Reinforcement Learning Explained Simply (with Diagrams and Real Code)
Reinforcement Learning: the essentials in one article — real code, diagrams and concrete steps, excerpts from a 45-lesson course.
A guide that gets straight to the point: Reinforcement Learning broken down with diagrams, concrete examples, and tested commands. Everything comes from a structured 16-chapter course — here are the best parts.
- Introduction and Installation
- Fundamentals of Reinforcement Learning
- Markov Decision Process
- Classic Q-Learning
- Dynamic Programming
First Gymnasium Environment (FrozenLake)
reset / step interface, the concept of an episode, and run a random agent on FrozenLake.Learning Objectives
- Create an environment with
gym.make - Understand the
resetandstepmethods - Identify the observation space and action space
- Write a complete episode loop
- Run a random agent and observe its score
FrozenLake: The Playground
FrozenLake is a 4x4 grid representing a frozen lake. The agent starts at the starting cell (S), must reach the gift (G) by walking on solid ice (F), while avoiding holes (H) where it falls and loses. Simple, visual, perfect for beginners.
is_slippery=True). The agent that tries to go right may slip and end up elsewhere. This adds randomness and makes the problem more interesting. It can be disabled for beginners.The Universal Interface: reset and step
All Gymnasium environments share the same interface, which makes writing agents so convenient. Two methods are sufficient.
reset()
Resets the environment to its initial state and returns the first state. It is called at the beginning of each episode (a complete game).
step(action)
Executes an action and returns five values: the new state, the reward, whether the episode is terminated, whether it is truncated, and info.
env.close() at the end, especially with graphical rendering. Otherwise, ghost windows may remain open and consume memory.Visualizing the Game
To watch the agent play on screen, add the render mode:
Solving FrozenLake with Value Iteration
Learning Objectives
- Write complete Value Iteration for FrozenLake
- Extract the optimal policy from V*
- Evaluate the agent over many episodes
- Interpret the obtained values and policy
- Understand the effect of slippery ice
Step 1: Access the FrozenLake Model
FrozenLake provides its full model via env.unwrapped.P. It is a dictionary that gives, for each state and each action, the list of possible transitions.
DQN Architecture with PyTorch
Learning Objectives
- Define a network with
nn.Module - Choose input and output sizes according to the environment
- Understand the role of hidden layers and ReLU
- Make a prediction (forward pass)
- Understand that the output gives one Q-value per action
The Anatomy of a DQN
A DQN for CartPole is a very simple network: it takes the 4 state numbers, passes them through two hidden layers, and produces 2 Q-values (one per action: left, right).
This article covers the most useful excerpts — the full Reinforcement Learning course (16 chapters, 45 lessons, corrected exercises, and final project) takes you all the way.
./access-the-full-course free course: Mastering Claude CodeFAQ
How long does it take to learn Reinforcement Learning?
Are there any prerequisites?
Where to start concretely?
📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.