Few-Shot Learning Explained Simply (With Diagrams and Real Code)
Few Shot Learning: The Essentials in One Article — Real Code, Diagrams and Concrete Steps, Extracts from a 35-Lesson Course.
A guide that gets straight to the point: Few Shot Learning broken down with diagrams, concrete examples and tested commands. Everything comes from a structured 11-chapter course — here are the highlights.
- Introduction and Installation
- Why few-shot
- Meta-learning fundamentals
- Metrics and benchmarks
- Siamese Networks
Attention for few-shot
Learning objectives
The intuition behind attention
Attention, in deep learning, is the idea that a model can assign a different weight to each element of a set according to the context. When you read a sentence, your brain does not pay the same attention to every word; it focuses on the important ones.
The classroom analogy
Imagine a classroom where each student gives an opinion on the answer to a question. Rather than taking the opinion of the best student (Siamese), or the average of everyone (ProtoNet), we weight the votes: a student who seems “close” to the topic gets more votes.
Siamese
We take the nearest example and ignore the rest. 1 vote, 0 abstentions.
ProtoNet
We compute the uniform average of the examples of a class. Fair vote, no nuance.
Matching Net
We give each example a weight according to its similarity to the query. Weighted vote.
Prediction formula
For a query q and a support {(x_i, y_i)}:
The model therefore predicts a probability distribution, the sum of all weighted contributions from every support example.
Soft vs hard attention
| Type | Behavior | Few-shot usage |
|---|---|---|
| Hard | Selects ONE example (argmax) | Equivalent to Siamese |
| Soft | Weights all examples | Matching Networks |
Context: the influence of other examples
An important nuance of Matching Networks: the encoder can take into account the entire context of the support (and not each image in isolation). This is the idea behind Full Context Embedding (FCE).
Comparison with ProtoNet
| Aspect | Matching Net | ProtoNet |
|---|---|---|
| Class representation | Keeps all examples | A single prototype |
| Mechanism | Weighted attention | Distance to prototype |
| Context | Yes (FCE optional) | No |
| K=1 performance (Omniglot) | ~98% | ~98% |
| K=5 performance (Omniglot) | ~99% | ~99.5% |
| Code complexity | Higher | Very simple |
The foundational paper
Real-world few-shot use cases
Learning objectives
Case 1 — Medicine: rare diseases
A disease is considered rare in Europe when it affects fewer than 1 person in 2,000. There are more than 7,000 rare diseases listed, yet each one often has only a few dozen documented cases per medical image.
Concrete problem
A radiologist sees a lesion that he has only encountered three times in his career. He would like his assistance AI to compare this image with all similar lesions already annotated worldwide.
What few-shot brings
A model that learns an embedding space on ordinary images and then retrieves the nearest cases from a database of rare lesions. Even 5 examples are enough to activate a class.
Case 2 — Biodiversity: rare species
The iNaturalist app catalogs hundreds of thousands of species, but many of them have only 5 to 20 photos in the entire global database (a recently discovered butterfly, a millipede endemic to a single forest…).
Case 3 — Industry: quality control on rare defects
Imagine an electronics production line. The vast majority of boards are healthy; the defects that reach visual inspection are 2 boards out of 100,000. Over 6 months we have 4 photos of a “cold solder” and 1 photo of a “ripped pad”.
| Approach | Result | Problem |
|---|---|---|
| Classic multi-class CNN | ~30% F1 on rare defects | Too few examples |
| Anomaly detection | Detects extra noise | Too sensitive to variations |
| Few-shot (ProtoNet) | ~88% F1 with 5 ex. | Currently the best |
Case 4 — NLP: niche customer intents
A banking support chatbot must understand very specific intents (“close account within 30 days”, “partial mortgage release”) that sometimes have fewer than 50 examples in the logs.
# Few-shot with LLM to classify an intent prompt = """ Tu es un classifieur d'intentions bancaires. Voici 5 exemples : "Je veux fermer mon compte ce mois-ci" -> CLOTURE_RAPIDE "Je souhaite réduire mon prêt immo partiellement" -> MAINLEVEE_PARTIELLE "Mon dossier de crédit est-il en cours ?" -> SUIVI_CREDIT "Je veux fermer mes comptes vite" -> CLOTURE_RAPIDE "Je veux liberer une partie de mon hypothèque" -> MAINLEVEE_PARTIELLE Question : "Comment clôturer en urgence ?" Réponse :""" # → the LLM answers CLOTURE_RAPIDE, without training
Case 5 — Finance: new fraud patterns
Fraudsters constantly invent new schemes. When a bank detects a new fraud pattern, it often has only 10 to 30 labeled transactions before the pattern disappears or changes.
Problem
It is impossible to retrain a classic model on every new pattern: it would be too slow and the pattern would already have changed.
Few-shot solution
An transaction embedding model compares the new transaction in real time with the 10 known frauds of the current pattern.
In-context learning with GPT
Learning objectives
The revolutionary idea of GPT-3
Before 2020, few-shot required training a specific model (ProtoNet, MAML, etc.). In 2020, OpenAI releases GPT-3 and shows that a large pre-trained language model can perform few-shot without additional training. It is enough to put the examples in the prompt.
A concrete example
prompt = """ Traduis du français vers l'anglais. Français : Le chat dort sur le canapé. Anglais : The cat is sleeping on the sofa. Français : Je vais au marché. Anglais : I am going to the market. Français : Il fait beau aujourd'hui. Anglais :""" # Expected GPT-4 response: # « It is sunny today. »
With no fine-tuning phase, just two examples in the prompt, GPT-4 understood the translation task and produced the correct answer.
Zero-shot vs One-shot vs Few-shot prompting
| Mode | Prompt content | Example |
|---|---|---|
| Zero-shot | Instruction only | “Translate: Hello” |
| One-shot | Instruction + 1 example | “Translate. ‘Merci’ -> ‘Thank you’. ‘Bonjour’ -> ” |
| Few-shot | Instruction + 3 to 10 examples | (As above with more examples) |
The paper that changed everything: Brown 2020
Why does it work?
No one has a complete answer. Several theories complement each other:
1. Pattern recognition
The model has seen millions of “Q: … Answer: …” lists during pre-training and recognizes the format.
2. Implicit meta-learning
During pre-training the model learned to learn new tasks from the text it consumes.
3. Induction heads
Analysis of Transformers reveals internal circuits (“induction heads”) specialized in copying patterns seen previously.
When to use in-context learning?
| Situation | Recommendation |
|---|---|
| Very little data (1-30 examples) | In-context learning with LLM |
| Lots of data (1000+ examples) | Classical fine-tuning |
| No internet / on-premise | ProtoNet or MAML local |
| Critical latency (< 10 ms) | Locally trained model |
| API cost too high | Fine-tuning or ProtoNet |
Python code: calling the OpenAI API in few-shot
from openai import OpenAI client = OpenAI() few_shot_examples = [ {"role": "system", "content": "Tu es un classifieur de sentiment."}, {"role": "user", "content": "J'adore ce film !"}, {"role": "assistant", "content": "POSITIF"}, {"role": "user", "content": "C'était ennuyeux."}, {"role": "assistant", "content": "NEGATIF"}, {"role": "user", "content": "Bof, sans plus."}, {"role": "assistant", "content": "NEUTRE"}, ] def classify(text): msgs = few_shot_examples + [{"role": "user", "content": text}] resp = client.chat.completions.create( model="gpt-4o-mini", messages=msgs, temperature=0, ) return resp.choices[0].message.content print(classify("L'histoire est captivante !")) # → POSITIF
This article covers the most useful excerpts — the complete Few Shot Learning course (11 chapters, 35 lessons, corrected exercises and final project) takes you all the way.
./access-the-complete-course free course: Prompt EngineeringFAQ
How long does it take to learn Few Shot Learning?
Are there any prerequisites?
Where to start concretely?
📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.