IA & LLM

Few-Shot Learning Explained Simply (With Diagrams and Real Code)

Few Shot Learning: The Essentials in One Article — Real Code, Diagrams and Concrete Steps, Extracts from a 35-Lesson Course.

REHOUMA Haythem

12 Jun 2026 • 14 min read

A guide that gets straight to the point: Few Shot Learning broken down with diagrams, concrete examples and tested commands. Everything comes from a structured 11-chapter course — here are the highlights.

tl;dr

Introduction and Installation
Why few-shot
Meta-learning fundamentals
Metrics and benchmarks
Siamese Networks

~$ cat ./parcours.md # Few Shot Learning — 10 chapters

Introduction and Installation

→ Course presentation→ Install PyTorch and the environment+ 1 more lessons

Why few-shot

→ The limits of classical deep learning→ Real use cases of few-shot+ 1 more lessons

Meta-learning fundamentals

→ Learning to learn→ Episodes, support set and query set+ 1 more lessons

Metrics and benchmarks

→ Omniglot, the « MNIST of few-shot »→ miniImageNet+ 1 more lessons

Siamese Networks

→ Siamese architecture→ Contrastive loss and Triplet loss+ 1 more lessons

Prototypical Networks

→ The idea of prototypes→ Euclidean distance and classification+ 1 more lessons

Matching Networks

→ Attention for few-shot→ Matching Networks architecture+ 1 more lessons

MAML and gradient meta-learning

→ Model-Agnostic Meta-Learning→ Inner loop vs outer loop+ 1 more lessons

🏁

Final project (+ 2 chapters along the way)

→ You leave with a concrete and demonstrable project

Attention for few-shot

NOTEObjective — Introduce the attention mechanism applied to few-shot. Understand the idea behind Matching Networks (Vinyals et al. 2016): instead of taking the nearest class, compute a weighted combination of all support examples.

Learning objectives

TIPBy the end of this module — You will be able to explain what attention is in deep learning and why it applies perfectly to the few-shot problem.

The intuition behind attention

Attention, in deep learning, is the idea that a model can assign a different weight to each element of a set according to the context. When you read a sentence, your brain does not pay the same attention to every word; it focuses on the important ones.

NOTEFor few-shot — Instead of comparing the query to a single reference (Siamese) or to the prototype (ProtoNet), we compare the query to all support examples at once, using learned weights.

The classroom analogy

Imagine a classroom where each student gives an opinion on the answer to a question. Rather than taking the opinion of the best student (Siamese), or the average of everyone (ProtoNet), we weight the votes: a student who seems “close” to the topic gets more votes.

Siamese

We take the nearest example and ignore the rest. 1 vote, 0 abstentions.

ProtoNet

We compute the uniform average of the examples of a class. Fair vote, no nuance.

Matching Net

We give each example a weight according to its similarity to the query. Weighted vote.

Prediction formula

For a query q and a support {(x_i, y_i)}:

The model therefore predicts a probability distribution, the sum of all weighted contributions from every support example.

Soft vs hard attention

Type	Behavior	Few-shot usage
Hard	Selects ONE example (argmax)	Equivalent to Siamese
Soft	Weights all examples	Matching Networks

TIPWhy soft wins — Soft attention is differentiable, allowing end-to-end optimization of the entire model via backpropagation. Hard attention (with argmax) is not differentiable.

Context: the influence of other examples

An important nuance of Matching Networks: the encoder can take into account the entire context of the support (and not each image in isolation). This is the idea behind Full Context Embedding (FCE).

Comparison with ProtoNet

Aspect	Matching Net	ProtoNet
Class representation	Keeps all examples	A single prototype
Mechanism	Weighted attention	Distance to prototype
Context	Yes (FCE optional)	No
K=1 performance (Omniglot)	~98%	~98%
K=5 performance (Omniglot)	~99%	~99.5%
Code complexity	Higher	Very simple

NOTEModern verdict — ProtoNet is generally preferred today for its simplicity. However, the ideas from Matching Networks (attention, FCE) have inspired many improvements (FEAT, ATNet, etc.).

The foundational paper

Real-world few-shot use cases

NOTEObjective — Put concrete faces on few-shot learning: for each domain we examine the problem, the constraints, what few-shot brings, and a quantified example. By the end you will be able to convince anyone that the topic is useful.

Learning objectives

TIPBy the end of this module — You will have in mind at least five real-world use cases (medical, biodiversity, industry, NLP, finance), together with their problem and the associated few-shot solution.

Case 1 — Medicine: rare diseases

A disease is considered rare in Europe when it affects fewer than 1 person in 2,000. There are more than 7,000 rare diseases listed, yet each one often has only a few dozen documented cases per medical image.

Concrete problem

A radiologist sees a lesion that he has only encountered three times in his career. He would like his assistance AI to compare this image with all similar lesions already annotated worldwide.

What few-shot brings

A model that learns an embedding space on ordinary images and then retrieves the nearest cases from a database of rare lesions. Even 5 examples are enough to activate a class.

WARNINGBusiness constraints — GDPR requires that patient images are not stored. Therefore we use anonymized embeddings instead. Explainability (showing the reference images) is also crucial for the physician.

Case 2 — Biodiversity: rare species

The iNaturalist app catalogs hundreds of thousands of species, but many of them have only 5 to 20 photos in the entire global database (a recently discovered butterfly, a millipede endemic to a single forest…).

NOTEReal example — For the woodland caribou in Quebec, some herds have only 12 individuals photographed. A few-shot model can learn to identify each individual caribou (re-ID) with so few examples.

Case 3 — Industry: quality control on rare defects

Imagine an electronics production line. The vast majority of boards are healthy; the defects that reach visual inspection are 2 boards out of 100,000. Over 6 months we have 4 photos of a “cold solder” and 1 photo of a “ripped pad”.

Approach	Result	Problem
Classic multi-class CNN	~30% F1 on rare defects	Too few examples
Anomaly detection	Detects extra noise	Too sensitive to variations
Few-shot (ProtoNet)	~88% F1 with 5 ex.	Currently the best

Case 4 — NLP: niche customer intents

A banking support chatbot must understand very specific intents (“close account within 30 days”, “partial mortgage release”) that sometimes have fewer than 50 examples in the logs.

output

# Few-shot with LLM to classify an intent
prompt = """
Tu es un classifieur d'intentions bancaires.
Voici 5 exemples :

"Je veux fermer mon compte ce mois-ci" -> CLOTURE_RAPIDE
"Je souhaite réduire mon prêt immo partiellement" -> MAINLEVEE_PARTIELLE
"Mon dossier de crédit est-il en cours ?" -> SUIVI_CREDIT
"Je veux fermer mes comptes vite" -> CLOTURE_RAPIDE
"Je veux liberer une partie de mon hypothèque" -> MAINLEVEE_PARTIELLE

Question : "Comment clôturer en urgence ?"
Réponse :"""
# → the LLM answers CLOTURE_RAPIDE, without training

TIPTip — Few-shot via LLM has become the tool of choice as soon as an NLP intent has fewer than 100 examples: it is free in dev time and can be updated by simply editing the prompt.

Case 5 — Finance: new fraud patterns

Fraudsters constantly invent new schemes. When a bank detects a new fraud pattern, it often has only 10 to 30 labeled transactions before the pattern disappears or changes.

Problem

It is impossible to retrain a classic model on every new pattern: it would be too slow and the pattern would already have changed.

Few-shot solution

An transaction embedding model compares the new transaction in real time with the 10 known frauds of the current pattern.

In-context learning with GPT

NOTEObjective — Discover in-context learning: the modern and revolutionary way to do few-shot with large language models (GPT-4, Claude, Gemini). We provide examples in the prompt; the model “learns” without a single weight update.

Learning objectives

TIPBy the end of this module — You will be able to explain what in-context learning is, the origin of the concept (GPT-3 / Brown 2020), and recognize when to use it instead of classical methods.

The revolutionary idea of GPT-3

Before 2020, few-shot required training a specific model (ProtoNet, MAML, etc.). In 2020, OpenAI releases GPT-3 and shows that a large pre-trained language model can perform few-shot without additional training. It is enough to put the examples in the prompt.

NOTEDefinition — In-context learning is the ability of a large language model to learn a task from examples placed in its prompt, without any update of its weights.

A concrete example

output

prompt = """
Traduis du français vers l'anglais.

Français : Le chat dort sur le canapé.
Anglais  : The cat is sleeping on the sofa.

Français : Je vais au marché.
Anglais  : I am going to the market.

Français : Il fait beau aujourd'hui.
Anglais  :"""

# Expected GPT-4 response:
# « It is sunny today. »

With no fine-tuning phase, just two examples in the prompt, GPT-4 understood the translation task and produced the correct answer.

Zero-shot vs One-shot vs Few-shot prompting

Mode	Prompt content	Example
Zero-shot	Instruction only	“Translate: Hello”
One-shot	Instruction + 1 example	“Translate. ‘Merci’ -> ‘Thank you’. ‘Bonjour’ -> ”
Few-shot	Instruction + 3 to 10 examples	(As above with more examples)

The paper that changed everything: Brown 2020

NOTETom Brown et al. 2020 — “Language Models are Few-Shot Learners” (NeurIPS 2020). This is the GPT-3 paper (175B parameters). It shows that accuracy increases massively when moving from 0 to 1, then 3, 5, 10 examples in the prompt. On many tasks, GPT-3 few-shot matched or beat specifically fine-tuned models.

Why does it work?

No one has a complete answer. Several theories complement each other:

1. Pattern recognition

The model has seen millions of “Q: … Answer: …” lists during pre-training and recognizes the format.

2. Implicit meta-learning

During pre-training the model learned to learn new tasks from the text it consumes.

3. Induction heads

Analysis of Transformers reveals internal circuits (“induction heads”) specialized in copying patterns seen previously.

When to use in-context learning?

TIPIdeal case — You have a few examples of a task, you want a quick prototype without training, and you are ready to pay for an LLM API. In-context learning gives you a usable model in 5 minutes.

Situation	Recommendation
Very little data (1-30 examples)	In-context learning with LLM
Lots of data (1000+ examples)	Classical fine-tuning
No internet / on-premise	ProtoNet or MAML local
Critical latency (< 10 ms)	Locally trained model
API cost too high	Fine-tuning or ProtoNet

Python code: calling the OpenAI API in few-shot

output

from openai import OpenAI

client = OpenAI()

few_shot_examples = [
    {"role": "system", "content": "Tu es un classifieur de sentiment."},
    {"role": "user",   "content": "J'adore ce film !"},
    {"role": "assistant", "content": "POSITIF"},
    {"role": "user",   "content": "C'était ennuyeux."},
    {"role": "assistant", "content": "NEGATIF"},
    {"role": "user",   "content": "Bof, sans plus."},
    {"role": "assistant", "content": "NEUTRE"},
]

def classify(text):
    msgs = few_shot_examples + [{"role": "user", "content": text}]
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=msgs,
        temperature=0,
    )
    return resp.choices[0].message.content

print(classify("L'histoire est captivante !"))  # → POSITIF

go-further

This article covers the most useful excerpts — the complete Few Shot Learning course (11 chapters, 35 lessons, corrected exercises and final project) takes you all the way.

./access-the-complete-course free course: Prompt Engineering

FAQ

How long does it take to learn Few Shot Learning?

With a structured progression (11 chapters, 35 short practical lessons), you reach an operational level in a few weeks at 30–60 minutes per day. The key is to practice each concept immediately.

Are there any prerequisites?

Basic computer science knowledge is enough. If you can use a terminal and read simple code, you are ready.

Where to start concretely?

Reproduce the commands in this article, then follow the complete Few Shot Learning course: it chains the 35 lessons in order, with exercises and a final project.

./read-also

→ Effective AI Prompts: the 9 key steps from zero to operational → Get started with Advanced Prompt Engineering: your first concrete step today → Fine Tuning LLMs explained simply (with diagrams and real code)

📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.

Attention for few-shot

Learning objectives

The intuition behind attention

The classroom analogy

Siamese

ProtoNet

Matching Net

Prediction formula

Soft vs hard attention

Context: the influence of other examples

Comparison with ProtoNet

The foundational paper

Real-world few-shot use cases

Learning objectives

Case 1 — Medicine: rare diseases

Concrete problem

What few-shot brings

Case 2 — Biodiversity: rare species

Case 3 — Industry: quality control on rare defects

Case 4 — NLP: niche customer intents

Case 5 — Finance: new fraud patterns

Problem

Few-shot solution

In-context learning with GPT

Learning objectives

The revolutionary idea of GPT-3

A concrete example

Zero-shot vs One-shot vs Few-shot prompting

The paper that changed everything: Brown 2020

Why does it work?

1. Pattern recognition

2. Implicit meta-learning

3. Induction heads

When to use in-context learning?

Python code: calling the OpenAI API in few-shot

FAQ

Stay up to date