IA & LLM

Fine-Tuning LLMs Explained Simply (with Diagrams and Real Code)

Fine Tuning LLMs: The Essentials in One Article — Real Code, Diagrams, and Concrete Steps, Excerpts from a 37-Lesson Course.

REHOUMA Haythem

12 Jun 2026 • 10 min read

A no-nonsense guide: Fine Tuning LLMs broken down with diagrams, concrete examples and tested commands. Everything comes from a structured 11-chapter course — here are the highlights.

tl;dr

Introduction and Installation
LLM Fundamentals
Data Preparation
Full Fine-Tuning
LoRA and QLoRA PEFT

~$ cat ./parcours.md # Fine Tuning LLMs — 9 chapters

Introduction and Installation

→ Course presentation and why fine-tune?→ Install Python, PyTorch and Hugging Face+ 1 more lessons

LLM Fundamentals

→ Transformer Architecture in brief→ Pre-training, SFT, RLHF, DPO+ 2 more lessons

Data Preparation

→ Data collection and cleaning→ Formats Alpaca, ChatML, ShareGPT, JSONL+ 2 more lessons

Full Fine-Tuning

→ Full fine-tuning concepts→ Hugging Face Trainer and TrainingArguments+ 2 more lessons

LoRA and QLoRA PEFT

→ LoRA low-rank adaptation principle→ QLoRA 4-bit quantization and NF4+ 2 more lessons

Training and Hyperparameters

→ Learning rate, batch size and epochs→ Schedulers cosine, linear, warmup+ 2 more lessons

Advanced Alignment DPO RLHF

→ DPO Direct Preference Optimization→ ORPO and KTO modern alternatives+ 1 more lessons

Deployment and Inference

→ GGUF quantization with llama.cpp→ Serving with vLLM or TGI (high perf)+ 1 more lessons

🏁

Final project (+ 1 chapters along the way)

→ You leave with a concrete, demonstrable project

Ollama and Local Integration

NOTEObjective — Deploy your fine-tuned model via Ollama, the simplest tool for running a local LLM (macOS, Windows, Linux) with a REST API in 30 seconds.

Learning Objectives

TIPBy the end of this module

Install Ollama and run a pre-existing model
Import your custom GGUF model via a Modelfile
Use Ollama’s REST API from any language
Integrate Ollama into a Python / Node / Rust app
Optimize for your hardware (CPU, M2, RTX)

Install Ollama

Run an Existing Model

Native API

Hardware	Model	Tokens/sec
M2 16 GB	Mistral 7B Q4_K_M	40
M3 Max 64 GB	Mistral 7B Q4_K_M	80
M3 Max 64 GB	Llama 3 70B Q4_K_M	10
RTX 4090 24 GB	Mistral 7B Q4_K_M	100+
RTX 3060 12 GB	Mistral 7B Q4_K_M	35

Ollama Use Cases in Production

Internal POC

Let business teams discover the model without any cloud infrastructure.

Desktop App

Embedded in Tauri / Electron / Swift apps for local analysis.

Edge / On-prem

Sensitive data that must never leave the internal network.

Ollama Limitations

Push Your Model to ollama.com

You can share your custom model on the public Ollama registry:

Alpaca, ChatML, ShareGPT, JSONL Formats

NOTEObjective — Learn the standard dataset formats for LLM fine-tuning and how to convert between them. Understand the importance of the chat template specific to each model.

Learning Objectives

TIPBy the end of this module

Identify the 4 most widely used formats in 2026
Convert a dataset between Alpaca, ChatML and ShareGPT
Apply the correct chat template for the target model
Save your dataset in streamable JSONL
Detect formatting errors before training

Format 1: Alpaca (the simplest)

Originating from the Stanford Alpaca project (2023). Three fields: instruction, input (optional) and output.

Convert Between Formats

Alpaca → ChatML

Hugging Face handles this automatically via tokenizer.apply_chat_template(). You should never write these templates by hand.

Install Python, PyTorch and Hugging Face

NOTEObjective — Set up a clean Python environment for fine-tuning: Python 3.11, PyTorch with CUDA, and the full Hugging Face stack (Transformers, PEFT, Datasets, TRL).

Learning Objectives

TIPBy the end of this module

Install Python 3.11 and a dedicated virtual environment
Choose and install the correct PyTorch version (CPU vs CUDA)
Install the complete Hugging Face stack with compatible versions
Verify that the GPU is properly detected by PyTorch
Create a Hugging Face account and configure your token

System Requirements

Component	Recommended	Minimum
Python	3.11	3.10
RAM	32 GB	16 GB
NVIDIA GPU	RTX 4090 (24 GB)	RTX 3060 (12 GB) or Colab T4
Free Disk Space	200 GB SSD	50 GB
CUDA Toolkit	12.1	11.8

WARNINGPython 3.12 note: In 2026, some dependencies (notably bitsandbytes on Windows) do not yet fully support Python 3.12. Stick with 3.11 for this course.

Step 1: Install Python 3.11 and a Virtual Environment

Create a working folder and a dedicated virtual environment for the course. This avoids any conflicts with other Python projects.

NVIDIA GPU with CUDA 12.1

peft

Parameter-Efficient Fine-Tuning. Essential for LoRA and QLoRA.

bitsandbytes

8-bit and 4-bit quantization. Enables QLoRA. Must match your CUDA version.

Step 4: Create a Hugging Face Account and Token

go-further

This article covers the most useful excerpts — the complete Fine Tuning LLMs course (11 chapters, 37 lessons, corrected exercises and a final project) takes you all the way.

./access-the-full-course free course: Prompt Engineering

FAQ

How long does it take to learn Fine Tuning LLMs?

With a structured progression (11 chapters, 37 short practical lessons), you reach an operational level in a few weeks at 30–60 minutes per day. The key is to practice each concept immediately.

Are there any prerequisites?

Basic computer science knowledge is enough. If you can use a terminal and read simple code, you’re ready.

Where should I start concretely?

Reproduce the commands in this article, then follow the complete Fine Tuning LLMs course: it walks through the 37 lessons in order, with exercises and a final project.

./read-also

→ Effective AI Prompts: the 9 key steps from zero to operational → Get started with Advanced Prompt Engineering: your first concrete step today → Custom AI Assistants in practice: the code and commands that really matter

📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.

Ollama and Local Integration

Learning Objectives

Install Ollama

Run an Existing Model

Native API

Ollama Use Cases in Production

Internal POC

Desktop App

Edge / On-prem

Ollama Limitations

Push Your Model to ollama.com

Alpaca, ChatML, ShareGPT, JSONL Formats

Learning Objectives

Format 1: Alpaca (the simplest)

Convert Between Formats

Alpaca → ChatML

Install Python, PyTorch and Hugging Face

Learning Objectives

System Requirements

Step 1: Install Python 3.11 and a Virtual Environment

NVIDIA GPU with CUDA 12.1

peft

bitsandbytes

Step 4: Create a Hugging Face Account and Token

FAQ

Stay up to date