Prompt Engineering — talk to AI like a pro — 3. Guiding the reasoning

17 min read min de lecture
Chapter 03

Guiding the reasoning

Chapter 3 of 10 · 30%

Chapter objectives

  • Use chain-of-thought on complex tasks
  • Break down big problems and have answers verified
  • Know when to guide the reasoning — and when it’s pointless

Why such a knowledgeable model gets a simple calculation wrong

Sofia is preparing a campaign budget: 15% discount, 35% margin, sales tax... She pastes it all into the chat, asks "what's the final budget?" and gets a wrong number. How can a system capable of expounding on quantum physics fail a percentage calculation?

Because a language model doesn't "calculate": it predicts text, word after word. When you demand an immediate answer, you're asking it to produce the result of a reasoning process... without doing the reasoning. On a multi-step problem, that's like demanding a human answer off the top of their head, instantly, with no scratch paper. The errors aren't an inevitability of the model: they're a consequence of the way of asking.

The fix is simple and it's one of the most documented techniques in prompt engineering: give the model the right — and the instruction — to lay out its scratch work before concluding.

Chain-of-thought: reason before answering

On a logical, mathematical or multi-step problem, add "reason step by step before answering". The model then lays out its reasoning explicitly: it identifies the data, runs through the intermediate calculations, then concludes. Each written step conditions the next, which strongly reduces errors compared to a direct answer. This is chain-of-thought.

PROMPT
A product costs €80. With a 35% margin on the selling price, what is the selling price?
Reason step by step, then give the final result.

This example is deliberately tricky: a 35% margin on the selling price is not calculated as 80 × 1.35. With a direct answer, many models fall into the trap. With guided reasoning, the model writes out the equation (selling price = cost / (1 − 0.35)) and finds €123.08. The scratch work isn't a luxury: it's what avoids the error.

flowchart LR
  Q["Multi-step problem"] --> E1["Step 1: lay out the data"]
  E1 --> E2["Step 2: calculate"]
  E2 --> V["Check consistency"]
  V --> R["Final line: Answer"]
Chain-of-thought: explicit reasoning reduces errors and can be verified.

A secondary benefit, at least as valuable as correctness: the reasoning is auditable. When the answer is wrong, you see at which step the error crept in — a misread figure, an inverted formula — and you fix precisely that point instead of rerunning at random hoping for better.

Separating thinking from answer

The full reasoning is useful for verification, but cumbersome to use: Sofia isn't going to paste fifteen lines of calculation into her email to the CFO. Good practice is to ask for both, clearly separated: first the thinking, then an isolated, concise conclusion.

PROMPT
Calculate the final campaign budget:
- 3 sponsored posts at €450 each
- negotiated discount of 15% on the total
- agency fee: 20% of the amount after discount

Reason step by step, writing out each calculation.
End with a line "Answer:" containing only the final amount in euros.

You get both the rigor (the reasoning, which you can verify in ten seconds) and the usable part (the conclusion, which you can copy as is). This separation is a habit to adopt on any analytical task: analysis then recommendation, diagnosis then fix, comparison then verdict.

For a clean conclusion: "end with an 'Answer:' line that summarizes without the reasoning". You can also ask for the reasoning in a "Scratch work" section and the answer in a "Conclusion" section — handy for reading only the end when you're in a hurry.

Decomposition: cutting up big problems

Chain-of-thought helps on a problem of a dozen steps. But some jobs are too big for a single prompt, even a well-guided one: analyzing 40 survey responses, writing a complete report, comparing three vendors on eight criteria. The queen technique here is decomposition: cutting the work into subtasks, and processing them one by one, each answer feeding the next.

For her campaign review, Sofia no longer asks "analyze these results and write the review". She chains: 1) "list the 5 main takeaways from these numbers", 2) "for each takeaway, propose one concrete action", 3) "now write the review following this outline". Three short prompts, each verifiable, instead of one monster prompt whose output is unverifiable as a block.

Decomposition has a hidden advantage: it keeps you in the loop. Between each step, you can correct, redirect, discard a bad lead — before it contaminates everything downstream. A single giant prompt, on the other hand, offers you no intermediate checkpoint.

Self-critique: having the work checked

Last technique in the family: asking the model to critique its own answer. After a first output, follow up with: "Reread your answer. Check every calculation and every claim. List any errors or weaknesses, then give a corrected version." The model often spots its own mistakes during this reread — exactly like a human proofreading with fresh eyes.

You can also build the verification into the initial prompt: "before giving your final answer, check the consistency of each step". That's what the "Check consistency" box in the diagram above does. Be careful though: self-critique improves things, but guarantees nothing. A model can confidently validate an error it just made. For any critical figure or fact, the final check remains human.

Self-critique is formidable on texts too: "reread this post and critique it from the point of view of a rushed restaurant owner: what would make them tune out?" often gives better improvement leads than "improve this post".

When to guide the reasoning — and when to refrain

These techniques have a cost: longer, slower answers that are more expensive in context. Pulling them out for everything would be counterproductive. Here's the usage map:

Direct answer (zero-shot)Simple, immediate tasks: rephrasing, translating, summarizing a short text. Guided reasoning would only add bloat.
Chain-of-thoughtLogic problems, calculations, multi-criteria decisions that fit in one prompt. "Reason step by step, then Answer:".
DecompositionJobs too big for one prompt: long analyses, reports, comparisons. Cut into chained subtasks, with human control between each.
Self-critiqueAs a complement to the others, when the stakes justify a verification pass: important figures, high-impact text, risky reasoning.

Finally, note that recent so-called "reasoning" models apply a form of chain-of-thought internally, without being asked. Even with them, these reflexes remain useful: requiring the detail of the steps makes the answer verifiable, and decomposition keeps human control over big jobs. The technique evolves, the principle — never accept an unverifiable conclusion — remains.

The limits: plausible reasoning is not correct reasoning

One last warning, which applies to this whole chapter. Chain-of-thought produces plausible reasoning — not necessarily correct reasoning. The model can roll out steps that look impeccable and slip an error in the middle, with the same quiet confidence. The fluency of the text is not proof of validity.

The professional reflex: verify the critical steps (the figures, the dated facts, the surprising claims), not just read the conclusion. Explicit reasoning doesn't replace your judgment — it simply makes it possible, where a bare answer forced you to take it on faith. That's exactly why we demand it.

🛠️ Your turn

Context

Sofia has to present her year-end campaign budget to the CFO tomorrow. The calculation mixes three sponsored posts at different rates, a 15% discount negotiated with the ad network, the agency's 20% commission on the discounted amount, and a budget cap of €2,000 not to be exceeded. Her first attempt with a direct answer gave a number she can't verify — and she already got called out last month for a calculation error. This time, she wants a result that is correct AND verifiable.

Instructions

  1. Pose the complete problem to the AI with all the numerical data, with a direct answer first (no guidance).
  2. Rerun with "reason step by step writing out each calculation, then end with an Answer: line".
  3. Compare the two results: are they identical? If not, which one does the reasoning let you verify?
  4. Verify two intermediate steps of the reasoning yourself (a calculation, a figure carried over).
  5. Follow up with a self-critique: "reread your reasoning, check every calculation, flag any error and correct it".
  6. Add the cap constraint: "does the budget exceed €2,000? If so, propose two quantified ways to save".
  7. Note in your winning prompts the guidance phrasing that worked best.
Hint — Explicit reasoning lets you spot where an error creeps in. Verify the steps, not just the final number — that's the whole point of the method.

In summary

  • A model predicts text: demanding an immediate answer on a complex problem means forbidding the scratch work.
  • "Reason step by step" (chain-of-thought) strongly reduces errors on multi-step tasks.
  • Separate thinking from answer: the reasoning to verify, an "Answer:" line to use.
  • Break big jobs into chained subtasks: each step becomes verifiable and correctable.
  • Self-critique ("reread and check your answer") catches some of the errors — not all.
  • Pointless on trivial tasks: reasoning guidance is reserved for problems that justify it.
  • Plausible reasoning is not necessarily correct: verify the critical steps yourself.

Quiz — check your understanding

1. When is chain-of-thought useful?

Guided reasoning shines on logical or multi-step tasks; it needlessly weighs down simple requests.

2. How do you get a clean conclusion?

Isolating the final answer makes it directly usable, while keeping the reasoning for verification.

3. Why does a model often get a multi-step calculation wrong with a direct answer?

Without explicit scratch work, the model must produce the conclusion of a reasoning it never laid out. Letting it reason changes everything.

4. What is the main benefit of decomposing into subtasks?

Cutting things up lets you correct course between steps, instead of receiving an unverifiable final block.

5. What does a well-written step-by-step reasoning guarantee?

Fluency is not proof. Explicit reasoning makes verification possible; it doesn't replace it.

Auteur(s)

R

REHOUMA Haythem

Haythem Rehouma est un ingénieur et architecte IA et cloud, formateur et enseignant technique, avec un profil orienté IA médicale, AWS, MLOps, LLM/RAG et vision par ordinateur.