Fine-Tuning LLMs Explained Simply (with Diagrams and Real Code)

Fine Tuning LLMs: The Essentials in One Article — Real Code, Diagrams, and Concrete Steps, Excerpts from a 37-Lesson Course.

Fine-Tuning LLMs Explained Simply (with Diagrams and Real Code)

A no-nonsense guide: Fine Tuning LLMs broken down with diagrams, concrete examples and tested commands. Everything comes from a structured 11-chapter course — here are the highlights.

tl;dr
  • Introduction and Installation
  • LLM Fundamentals
  • Data Preparation
  • Full Fine-Tuning
  • LoRA and QLoRA PEFT
~$ cat ./parcours.md # Fine Tuning LLMs — 9 chapters
01
Introduction and Installation
→ Course presentation and why fine-tune?→ Install Python, PyTorch and Hugging Face+ 1 more lessons
02
LLM Fundamentals
→ Transformer Architecture in brief→ Pre-training, SFT, RLHF, DPO+ 2 more lessons
03
Data Preparation
→ Data collection and cleaning→ Formats Alpaca, ChatML, ShareGPT, JSONL+ 2 more lessons
04
Full Fine-Tuning
→ Full fine-tuning concepts→ Hugging Face Trainer and TrainingArguments+ 2 more lessons
05
LoRA and QLoRA PEFT
→ LoRA low-rank adaptation principle→ QLoRA 4-bit quantization and NF4+ 2 more lessons
06
Training and Hyperparameters
→ Learning rate, batch size and epochs→ Schedulers cosine, linear, warmup+ 2 more lessons
07
Advanced Alignment DPO RLHF
→ DPO Direct Preference Optimization→ ORPO and KTO modern alternatives+ 1 more lessons
08
Deployment and Inference
→ GGUF quantization with llama.cpp→ Serving with vLLM or TGI (high perf)+ 1 more lessons
🏁
Final project (+ 1 chapters along the way)
→ You leave with a concrete, demonstrable project

Ollama and Local Integration

NOTEObjective — Deploy your fine-tuned model via Ollama, the simplest tool for running a local LLM (macOS, Windows, Linux) with a REST API in 30 seconds.

Learning Objectives

TIPBy the end of this module
  • Install Ollama and run a pre-existing model
  • Import your custom GGUF model via a Modelfile
  • Use Ollama’s REST API from any language
  • Integrate Ollama into a Python / Node / Rust app
  • Optimize for your hardware (CPU, M2, RTX)

Install Ollama

Run an Existing Model

Native API

HardwareModelTokens/sec
M2 16 GBMistral 7B Q4_K_M40
M3 Max 64 GBMistral 7B Q4_K_M80
M3 Max 64 GBLlama 3 70B Q4_K_M10
RTX 4090 24 GBMistral 7B Q4_K_M100+
RTX 3060 12 GBMistral 7B Q4_K_M35

Ollama Use Cases in Production

Internal POC

Let business teams discover the model without any cloud infrastructure.

Desktop App

Embedded in Tauri / Electron / Swift apps for local analysis.

Edge / On-prem

Sensitive data that must never leave the internal network.

Ollama Limitations

Push Your Model to ollama.com

You can share your custom model on the public Ollama registry:

Alpaca, ChatML, ShareGPT, JSONL Formats

NOTEObjective — Learn the standard dataset formats for LLM fine-tuning and how to convert between them. Understand the importance of the chat template specific to each model.

Learning Objectives

TIPBy the end of this module
  • Identify the 4 most widely used formats in 2026
  • Convert a dataset between Alpaca, ChatML and ShareGPT
  • Apply the correct chat template for the target model
  • Save your dataset in streamable JSONL
  • Detect formatting errors before training

Format 1: Alpaca (the simplest)

Originating from the Stanford Alpaca project (2023). Three fields: instruction, input (optional) and output.

Convert Between Formats

Alpaca → ChatML

Hugging Face handles this automatically via tokenizer.apply_chat_template(). You should never write these templates by hand.

Install Python, PyTorch and Hugging Face

NOTEObjective — Set up a clean Python environment for fine-tuning: Python 3.11, PyTorch with CUDA, and the full Hugging Face stack (Transformers, PEFT, Datasets, TRL).

Learning Objectives

TIPBy the end of this module
  • Install Python 3.11 and a dedicated virtual environment
  • Choose and install the correct PyTorch version (CPU vs CUDA)
  • Install the complete Hugging Face stack with compatible versions
  • Verify that the GPU is properly detected by PyTorch
  • Create a Hugging Face account and configure your token

System Requirements

ComponentRecommendedMinimum
Python3.113.10
RAM32 GB16 GB
NVIDIA GPURTX 4090 (24 GB)RTX 3060 (12 GB) or Colab T4
Free Disk Space200 GB SSD50 GB
CUDA Toolkit12.111.8
WARNINGPython 3.12 note: In 2026, some dependencies (notably bitsandbytes on Windows) do not yet fully support Python 3.12. Stick with 3.11 for this course.

Step 1: Install Python 3.11 and a Virtual Environment

Create a working folder and a dedicated virtual environment for the course. This avoids any conflicts with other Python projects.

NVIDIA GPU with CUDA 12.1

peft

Parameter-Efficient Fine-Tuning. Essential for LoRA and QLoRA.

bitsandbytes

8-bit and 4-bit quantization. Enables QLoRA. Must match your CUDA version.

Step 4: Create a Hugging Face Account and Token

go-further

This article covers the most useful excerpts — the complete Fine Tuning LLMs course (11 chapters, 37 lessons, corrected exercises and a final project) takes you all the way.

./access-the-full-course free course: Prompt Engineering

FAQ

How long does it take to learn Fine Tuning LLMs?
With a structured progression (11 chapters, 37 short practical lessons), you reach an operational level in a few weeks at 30–60 minutes per day. The key is to practice each concept immediately.
Are there any prerequisites?
Basic computer science knowledge is enough. If you can use a terminal and read simple code, you’re ready.
Where should I start concretely?
Reproduce the commands in this article, then follow the complete Fine Tuning LLMs course: it walks through the 37 lessons in order, with exercises and a final project.

📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.