Guiding the reasoning
Chapter objectives
- Use chain-of-thought on complex tasks
- Break down big problems and have answers verified
- Know when to guide the reasoning — and when it’s pointless
Why such a knowledgeable model gets a simple calculation wrong
Sofia is preparing a campaign budget: 15% discount, 35% margin, sales tax... She pastes it all into the chat, asks "what's the final budget?" and gets a wrong number. How can a system capable of expounding on quantum physics fail a percentage calculation?
Because a language model doesn't "calculate": it predicts text, word after word. When you demand an immediate answer, you're asking it to produce the result of a reasoning process... without doing the reasoning. On a multi-step problem, that's like demanding a human answer off the top of their head, instantly, with no scratch paper. The errors aren't an inevitability of the model: they're a consequence of the way of asking.
The fix is simple and it's one of the most documented techniques in prompt engineering: give the model the right — and the instruction — to lay out its scratch work before concluding.
Chain-of-thought: reason before answering
On a logical, mathematical or multi-step problem, add "reason step by step before answering". The model then lays out its reasoning explicitly: it identifies the data, runs through the intermediate calculations, then concludes. Each written step conditions the next, which strongly reduces errors compared to a direct answer. This is chain-of-thought.
A product costs €80. With a 35% margin on the selling price, what is the selling price? Reason step by step, then give the final result.
This example is deliberately tricky: a 35% margin on the selling price is not calculated as 80 × 1.35. With a direct answer, many models fall into the trap. With guided reasoning, the model writes out the equation (selling price = cost / (1 − 0.35)) and finds €123.08. The scratch work isn't a luxury: it's what avoids the error.
flowchart LR Q["Multi-step problem"] --> E1["Step 1: lay out the data"] E1 --> E2["Step 2: calculate"] E2 --> V["Check consistency"] V --> R["Final line: Answer"]
A secondary benefit, at least as valuable as correctness: the reasoning is auditable. When the answer is wrong, you see at which step the error crept in — a misread figure, an inverted formula — and you fix precisely that point instead of rerunning at random hoping for better.
Separating thinking from answer
The full reasoning is useful for verification, but cumbersome to use: Sofia isn't going to paste fifteen lines of calculation into her email to the CFO. Good practice is to ask for both, clearly separated: first the thinking, then an isolated, concise conclusion.
Calculate the final campaign budget: - 3 sponsored posts at €450 each - negotiated discount of 15% on the total - agency fee: 20% of the amount after discount Reason step by step, writing out each calculation. End with a line "Answer:" containing only the final amount in euros.
You get both the rigor (the reasoning, which you can verify in ten seconds) and the usable part (the conclusion, which you can copy as is). This separation is a habit to adopt on any analytical task: analysis then recommendation, diagnosis then fix, comparison then verdict.
Decomposition: cutting up big problems
Chain-of-thought helps on a problem of a dozen steps. But some jobs are too big for a single prompt, even a well-guided one: analyzing 40 survey responses, writing a complete report, comparing three vendors on eight criteria. The queen technique here is decomposition: cutting the work into subtasks, and processing them one by one, each answer feeding the next.
For her campaign review, Sofia no longer asks "analyze these results and write the review". She chains: 1) "list the 5 main takeaways from these numbers", 2) "for each takeaway, propose one concrete action", 3) "now write the review following this outline". Three short prompts, each verifiable, instead of one monster prompt whose output is unverifiable as a block.
Decomposition has a hidden advantage: it keeps you in the loop. Between each step, you can correct, redirect, discard a bad lead — before it contaminates everything downstream. A single giant prompt, on the other hand, offers you no intermediate checkpoint.
Self-critique: having the work checked
Last technique in the family: asking the model to critique its own answer. After a first output, follow up with: "Reread your answer. Check every calculation and every claim. List any errors or weaknesses, then give a corrected version." The model often spots its own mistakes during this reread — exactly like a human proofreading with fresh eyes.
You can also build the verification into the initial prompt: "before giving your final answer, check the consistency of each step". That's what the "Check consistency" box in the diagram above does. Be careful though: self-critique improves things, but guarantees nothing. A model can confidently validate an error it just made. For any critical figure or fact, the final check remains human.
When to guide the reasoning — and when to refrain
These techniques have a cost: longer, slower answers that are more expensive in context. Pulling them out for everything would be counterproductive. Here's the usage map:
Finally, note that recent so-called "reasoning" models apply a form of chain-of-thought internally, without being asked. Even with them, these reflexes remain useful: requiring the detail of the steps makes the answer verifiable, and decomposition keeps human control over big jobs. The technique evolves, the principle — never accept an unverifiable conclusion — remains.
The limits: plausible reasoning is not correct reasoning
One last warning, which applies to this whole chapter. Chain-of-thought produces plausible reasoning — not necessarily correct reasoning. The model can roll out steps that look impeccable and slip an error in the middle, with the same quiet confidence. The fluency of the text is not proof of validity.
The professional reflex: verify the critical steps (the figures, the dated facts, the surprising claims), not just read the conclusion. Explicit reasoning doesn't replace your judgment — it simply makes it possible, where a bare answer forced you to take it on faith. That's exactly why we demand it.
Context
Sofia has to present her year-end campaign budget to the CFO tomorrow. The calculation mixes three sponsored posts at different rates, a 15% discount negotiated with the ad network, the agency's 20% commission on the discounted amount, and a budget cap of €2,000 not to be exceeded. Her first attempt with a direct answer gave a number she can't verify — and she already got called out last month for a calculation error. This time, she wants a result that is correct AND verifiable.
Instructions
- Pose the complete problem to the AI with all the numerical data, with a direct answer first (no guidance).
- Rerun with "reason step by step writing out each calculation, then end with an Answer: line".
- Compare the two results: are they identical? If not, which one does the reasoning let you verify?
- Verify two intermediate steps of the reasoning yourself (a calculation, a figure carried over).
- Follow up with a self-critique: "reread your reasoning, check every calculation, flag any error and correct it".
- Add the cap constraint: "does the budget exceed €2,000? If so, propose two quantified ways to save".
- Note in your winning prompts the guidance phrasing that worked best.
In summary
- A model predicts text: demanding an immediate answer on a complex problem means forbidding the scratch work.
- "Reason step by step" (chain-of-thought) strongly reduces errors on multi-step tasks.
- Separate thinking from answer: the reasoning to verify, an "Answer:" line to use.
- Break big jobs into chained subtasks: each step becomes verifiable and correctable.
- Self-critique ("reread and check your answer") catches some of the errors — not all.
- Pointless on trivial tasks: reasoning guidance is reserved for problems that justify it.
- Plausible reasoning is not necessarily correct: verify the critical steps yourself.
Quiz — check your understanding
1. When is chain-of-thought useful?
2. How do you get a clean conclusion?
3. Why does a model often get a multi-step calculation wrong with a direct answer?
4. What is the main benefit of decomposing into subtasks?
5. What does a well-written step-by-step reasoning guarantee?