What is an LLM's context window?

The context window is the amount of text an AI can look at and remember during one chat. If the chat gets too long, the AI starts forgetting the oldest parts.

7 min read min de lecture

~$ man fenetre-de-contexte

What is an LLM's context window?

AI & LLMs 2026 gneurone encyclopedia
The context window is the amount of text an AI can look at and remember during one chat. If the chat gets too long, the AI starts forgetting the oldest parts.

definition

The context window is the maximum number of tokens an LLM can handle in one go as input plus output.

It sets how much conversation history, documents, or instructions the model can use when creating its next reply.

Bigger windows let models work with longer texts but need more memory and compute power.

Think of it like a notebook page where you can only write so many lines before the top lines get erased to make room for new ones.

key takeaways

  • Context window size is counted in tokens, where one token is roughly four characters or three-quarters of a word.
  • It caps the total length of the prompt and all prior messages the model can consider.
  • Going over the limit causes the oldest tokens to be dropped automatically.
  • Current models range from 4k tokens in older versions to 128k or 1M tokens in newer ones.
  • Methods such as retrieval-augmented generation help work around small windows by fetching only needed facts.

the 2026 job market

By 2026, teams building LLM apps need engineers who can design prompts and systems around context limits, creating steady demand for roles in AI application development, prompt optimization, and efficient inference pipelines across US, Canada, and UK tech markets.

AI Engineer · 125000-185000 USD / 105000-160000 CAD / 75000-115000 GBPPrompt Engineer · 95000-145000 USD / 80000-125000 CAD / 55000-85000 GBPLLM Application Developer · 110000-170000 USD / 95000-145000 CAD / 65000-100000 GBP

frequently asked questions

How does context window size change model behavior?

Larger windows let the model keep more history and details, leading to more coherent long conversations. Smaller windows force earlier truncation, which can break continuity. Developers often test different sizes to balance cost and quality.

What token limits do popular LLMs have today?

GPT-4o supports 128k tokens while Claude 3 reaches 200k and some open models go to 1M. Limits are set by the model architecture and training. Always check the provider docs because they can update.

Can you increase an LLM context window after training?

Some techniques like position interpolation or fine-tuning allow modest extensions. Full increases usually require retraining or switching to a different base model. Most production work focuses on prompt compression instead.

Why do longer contexts cost more to run?

Attention mechanisms scale quadratically with sequence length, so compute and memory grow fast. Providers charge per token, and longer inputs use more of them. Efficient chunking and caching reduce these costs in practice.

courses to go further

$ cat ./full-guide.mdIntroduction LLMs SLMs expliqué simplement (avec schémas et vrai code)read the guide →

related terms

< back to the encyclopedia

Auteur(s)

R

REHOUMA Haythem

Haythem Rehouma est un ingénieur et architecte IA et cloud, formateur et enseignant technique, avec un profil orienté IA médicale, AWS, MLOps, LLM/RAG et vision par ordinateur.