Fine-Tuning LLMs Explained Simply (with Diagrams and Real Code)
Fine Tuning LLMs: The Essentials in One Article — Real Code, Diagrams, and Concrete Steps, Excerpts from a 37-Lesson Course.
A no-nonsense guide: Fine Tuning LLMs broken down with diagrams, concrete examples and tested commands. Everything comes from a structured 11-chapter course — here are the highlights.
- Introduction and Installation
- LLM Fundamentals
- Data Preparation
- Full Fine-Tuning
- LoRA and QLoRA PEFT
Ollama and Local Integration
Learning Objectives
- Install Ollama and run a pre-existing model
- Import your custom GGUF model via a Modelfile
- Use Ollama’s REST API from any language
- Integrate Ollama into a Python / Node / Rust app
- Optimize for your hardware (CPU, M2, RTX)
Install Ollama
Run an Existing Model
Native API
| Hardware | Model | Tokens/sec |
|---|---|---|
| M2 16 GB | Mistral 7B Q4_K_M | 40 |
| M3 Max 64 GB | Mistral 7B Q4_K_M | 80 |
| M3 Max 64 GB | Llama 3 70B Q4_K_M | 10 |
| RTX 4090 24 GB | Mistral 7B Q4_K_M | 100+ |
| RTX 3060 12 GB | Mistral 7B Q4_K_M | 35 |
Ollama Use Cases in Production
Internal POC
Let business teams discover the model without any cloud infrastructure.
Desktop App
Embedded in Tauri / Electron / Swift apps for local analysis.
Edge / On-prem
Sensitive data that must never leave the internal network.
Ollama Limitations
Push Your Model to ollama.com
You can share your custom model on the public Ollama registry:
Alpaca, ChatML, ShareGPT, JSONL Formats
Learning Objectives
- Identify the 4 most widely used formats in 2026
- Convert a dataset between Alpaca, ChatML and ShareGPT
- Apply the correct chat template for the target model
- Save your dataset in streamable JSONL
- Detect formatting errors before training
Format 1: Alpaca (the simplest)
Originating from the Stanford Alpaca project (2023). Three fields: instruction, input (optional) and output.
Convert Between Formats
Alpaca → ChatML
Hugging Face handles this automatically via tokenizer.apply_chat_template(). You should never write these templates by hand.
Install Python, PyTorch and Hugging Face
Learning Objectives
- Install Python 3.11 and a dedicated virtual environment
- Choose and install the correct PyTorch version (CPU vs CUDA)
- Install the complete Hugging Face stack with compatible versions
- Verify that the GPU is properly detected by PyTorch
- Create a Hugging Face account and configure your token
System Requirements
| Component | Recommended | Minimum |
|---|---|---|
| Python | 3.11 | 3.10 |
| RAM | 32 GB | 16 GB |
| NVIDIA GPU | RTX 4090 (24 GB) | RTX 3060 (12 GB) or Colab T4 |
| Free Disk Space | 200 GB SSD | 50 GB |
| CUDA Toolkit | 12.1 | 11.8 |
bitsandbytes on Windows) do not yet fully support Python 3.12. Stick with 3.11 for this course.Step 1: Install Python 3.11 and a Virtual Environment
Create a working folder and a dedicated virtual environment for the course. This avoids any conflicts with other Python projects.
NVIDIA GPU with CUDA 12.1
peft
Parameter-Efficient Fine-Tuning. Essential for LoRA and QLoRA.
bitsandbytes
8-bit and 4-bit quantization. Enables QLoRA. Must match your CUDA version.
Step 4: Create a Hugging Face Account and Token
This article covers the most useful excerpts — the complete Fine Tuning LLMs course (11 chapters, 37 lessons, corrected exercises and a final project) takes you all the way.
./access-the-full-course free course: Prompt EngineeringFAQ
How long does it take to learn Fine Tuning LLMs?
Are there any prerequisites?
Where should I start concretely?
📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.