~$ man token
What is a token (in AI)?
definition
In AI and large language models, a token is the smallest unit of text that the model processes. Tokens can represent whole words, word pieces, or characters based on the chosen tokenizer.
Tokenization converts raw text into a sequence of these tokens so the model can handle input and generate output. This step sets the maximum context size and influences how the model counts usage for billing.
A token works like the separate letters or syllables kids use to build words with alphabet blocks; the AI assembles meaning from these blocks instead of whole sentences at once.
key takeaways
- Tokens set the maximum amount of text an LLM can read or write in one pass.
- The same sentence can use different numbers of tokens across models because each uses its own tokenizer.
- API pricing for most LLMs is based on the number of tokens sent and received.
- Subword tokenization lets models handle new or rare words by splitting them into known pieces.
- Prompt engineers count tokens to stay under limits and reduce costs while keeping needed context.
the 2026 job market
By 2026 token-aware skills are required for LLM deployment, cost control, and fine-tuning roles as companies run thousands of daily AI calls where inefficient token use raises bills and slows performance.
frequently asked questions
How do you count tokens in a prompt?
Use the model's official tokenizer library or API endpoint to split the text. The count includes both input and expected output tokens for cost estimates.
What happens when you exceed the token limit?
The model truncates the input or returns an error. Longer conversations require summarization or sliding context windows to stay inside the limit.
Are tokens the same as words?
No. One word can equal one, two, or more tokens depending on the tokenizer. Common words often map to single tokens while rare terms split into multiple pieces.
Why do token counts affect AI response speed?
More tokens mean more computation steps inside the transformer layers. Providers also queue and bill by token volume, so higher counts increase both latency and cost.
