Introduction to LLMs and SLMs Explained Simply (with Diagrams and Real Code)

Introduction to LLMs and SLMs: The Essentials in One Article — Real Code, Diagrams, and Concrete Steps, Excerpts from a 44-Lesson Course.

Introduction to LLMs and SLMs Explained Simply (with Diagrams and Real Code)

A no-nonsense guide: Introduction LLMs SLMs dissected with diagrams, concrete examples and tested commands. Everything comes from a structured 11-chapter course — here are the best parts.

tl;dr
  • Introduction and Installation
  • How an LLM Works
  • Transformer Architecture
  • Overview of LLMs in 2026
  • SLMs Small Language Models
~$ cat ./parcours.md # Introduction LLMs SLMs — 10 chapters
01
Introduction and Installation
→ Course presentation and brief history of LLMs→ Install Ollama and run your first model+ 1 more lessons
02
How an LLM Works
→ Tokens: how the model sees text→ Context window (context window)+ 2 more lessons
03
Transformer Architecture
→ The "Attention is All You Need" paper explained→ Encoder vs Decoder vs Encoder-Decoder+ 2 more lessons
04
Overview of LLMs in 2026
→ Proprietary LLMs: OpenAI, Anthropic, Google→ Open-weights models: Llama, Mistral, Qwen, Gemma+ 2 more lessons
05
SLMs Small Language Models
→ SLM vs LLM: definition and threshold→ Overview: Phi-3, Gemma, TinyLlama, Qwen-small+ 2 more lessons
06
Local Inference With Ollama
→ Essential Ollama commands→ Quantization: Q4, Q5, Q8 explained+ 2 more lessons
07
Hugging Face Transformers
→ Installation and first pipeline→ AutoModel and AutoTokenizer+ 2 more lessons
08
Choosing the Right Model
→ Criteria: cost, latency, privacy, quality→ LLM decision matrix cloud / open / SLM+ 1 more lessons
🏁
Final project (+ 2 chapters along the way)
→ You leave with a concrete and demonstrable project

Install Ollama and run your first model

NOTEObjective — Install Ollama on Windows, macOS or Linux, download your first model and have a complete conversation with an LLM running entirely locally on your machine, with no Internet connection after the initial download.

Learning objectives

TIPAt the end of this module
  • Install Ollama on your operating system
  • Verify that the service is running correctly
  • Download a model from the Ollama library
  • Start a conversation with a local model
  • Understand where models are stored on your machine
  • Know the basic Ollama commands

Why Ollama?

Ollama is an open-source tool that radically simplifies using LLMs locally. Where you previously had to manually manage quantization, GPU bindings and Python dependencies, Ollama gives you a single binary and a command as simple as ollama run llama3. It has become the reference in 2026 for running an LLM on your laptop.

Simplicity

A single command to download and run a model. No manual GPU configuration.

Multi-platform

Windows, macOS (Apple Silicon) and Linux. Automatically optimized for CPU and GPU.

Built-in REST API

Ollama exposes a local API on http://localhost:11434 for integration into your apps.

Step-by-step installation

Go to https://ollama.com/download and choose your system. Installation takes less than two minutes.

Windows

TIPTip: type /bye to exit the conversation and /? to see all available commands in Ollama's interactive mode.

Where are the models stored?

Models can be large. Knowing where they live avoids unpleasant storage surprises.

OSDefault location
WindowsC:\Users\<votre-nom>\.ollama\models
macOS~/.ollama/models
Linux/usr/share/ollama/.ollama/models

To change this location (to an external drive for example), set the OLLAMA_MODELS environment variable before starting the service.

Ollama API and Python integration

NOTEObjective — Discover Ollama's local API and call it from Python, to integrate a local LLM into your own scripts and applications.

Learning objectives

TIPAt the end of this module
  • Understand that Ollama exposes a local HTTP API
  • Call the API with curl and from Python
  • Use the official Python library
  • Pass a system prompt and options
  • Integrate a local LLM into an application

Ollama is also a server

In addition to the command line, Ollama runs in the background as a local HTTP server, accessible at http://localhost:11434. Everything the CLI does, you can do via HTTP request, from any language.

The bridge to code

From chatting in the terminal to the Python API, you now know how to integrate a local LLM into any program.

Installation and first pipeline

NOTEObjective — Install the Transformers library from Hugging Face and perform your first inference in Python with the simplest abstraction: the pipeline.

Learning objectives

TIPAt the end of this module
  • Install Transformers and its dependencies
  • Understand what a pipeline is
  • Run sentiment analysis in 3 lines
  • Know the available tasks
  • Load a specific model into a pipeline

Hugging Face: the GitHub of models

Hugging Face provides the Transformers library, which has become the standard for using open-source models in Python. It gives access to hundreds of thousands of models through a uniform interface.

Ready-to-use tasks

Task (string)What it does
sentiment-analysisDetermines whether a text is positive or negative.
text-generationCompletes or generates text.
summarizationSummarizes a long text.
translationTranslates from one language to another.
question-answeringAnswers a question from a provided context.
zero-shot-classificationClassifies a text into categories you define.

Choosing a specific model

For French or a specific need, explicitly specify the model (its Hugging Face identifier).

WARNINGWarning: Default generation pipelines use small, older models (such as GPT-2). Do not judge the quality of modern LLMs by them: these are teaching tools, not production models.
go-further

This article covers the most useful excerpts — the full Introduction LLMs SLMs course (11 chapters, 44 lessons, corrected exercises and final project) takes you all the way.

./access-the-full-course free course: Prompt Engineering

FAQ

How long does it take to learn Introduction LLMs SLMs?
With a structured progression (11 chapters, 44 short and practical lessons), you reach an operational level in a few weeks at 30 to 60 minutes per day. The key is to practice each concept immediately.
Are there any prerequisites?
No prerequisites: the course starts from zero; every concept is introduced before being used.
Where to start concretely?
Reproduce the commands in this article, then follow the full Introduction LLMs SLMs course: it chains the 44 lessons in order, with exercises and a final project.

📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.