Challenge 5.5: Chain of Thought

THINK

When you solve a complex maths problem — do you calculate in your head or write down intermediate steps? Why?

OVERVIEW

Left: direct question to LLM produces error-prone answer (red). Right (Chain of Thought): question goes to LLM that thinks step by step, guided by thinking-instructions and output-format, producing a traceable answer

Left: Without chain of thought the LLM answers immediately — error-prone for complex questions (red). Right: With <thinking-instructions> and <output-format> the LLM thinks first. The answer is traceable and based on intermediate steps.

WHY

Without chain of thought: LLMs generate tokens sequentially — each word is based on the previous ones. If the LLM starts with the answer right away, it has no opportunity to think through the problem. For complex tasks this leads to superficial or incorrect answers. Errors are not traceable because the reasoning process remains invisible.

With chain of thought: The LLM “burns” tokens on its reasoning process before generating the actual answer. The intermediate steps are in context and influence the subsequent tokens. The reasoning chain is verifiable, errors are localisable, and the results are better.

WALKTHROUGH

Layer 1: What Is Chain of Thought (Making Intermediate Steps Explicit)

Chain of Thought (CoT) means: The LLM should write down its thoughts before answering. Instead of outputting the result directly, it first goes through a visible reasoning process.

Without CoT:  "What is 37 * 28?"  →  "1024"  (guessed, wrong — actual: 1036)
With CoT:     "What is 37 * 28?"  →  "37 * 30 = 1110, 37 * 2 = 74, 1110 - 74 = 1036"  →  "1036"

The tokens the LLM generates while thinking influence the quality of the actual answer. The plan is in context — and context controls what comes next.

Modern models like Claude offer “Extended Thinking” as a built-in feature. The prompt-based CoT described here is the manual variant that works with any model.

Layer 2: The thinking-instructions XML Tag

The <thinking-instructions> tag tells the LLM HOW it should think. It sits at the end of the prompt (high influence) and defines the steps of the reasoning process:

<thinking-instructions>
Think about your answer first before you respond.
Consider the optimal path for the user to understand the material.
Create a list of knowledge dependencies — pieces of knowledge
that rely on other pieces of knowledge.
Order them from foundational to advanced.
</thinking-instructions>

Important: The instructions do not describe WHAT the LLM should answer, but HOW it should think about it. That is the difference from <rules> — rules control the output, thinking instructions control the reasoning process.

Layer 3: output-format for Structured Output

Without a clear <output-format> the LLM mixes thinking and answer. With the tag you cleanly separate both:

<output-format>
Return two sections:
- A <thinking> block with your thought process, wrapped in a <thinking> tag.
- The answer (unwrapped, in markdown format). Do not wrap it in an <answer> tag.
</output-format>

The LLM then produces output in this structure:

<thinking>
1. Der User kennt wahrscheinlich keine Generics
2. Ich muss erst Grundtypen erklaeren
3. Dann Generics einfuehren
4. Dann das konkrete Beispiel zeigen
</thinking>

# TypeScript Generics erklaert

Stell Dir vor, Du hast eine Funktion die mit verschiedenen Typen arbeiten soll...

This way the frontend can render or hide the <thinking> block separately — the actual answer sits cleanly separated below it.

Layer 4: When to Use Chain of Thought?

CoT is not useful for every task. It costs additional tokens and slows down the response.

Useful	Not Needed
Complex logic, multi-step analysis	Simple questions (“What is the capital of France?”)
Code review, debugging	Format conversion
Analysis with multiple factors	Translations
Tasks with knowledge dependencies	Summaries of short texts
When traceability matters	When speed is the priority

Rule of thumb: If you as a human would write down intermediate steps, the LLM also benefits from CoT.

TRY

Task: Add chain of thought to a code review prompt. The LLM should first analyse the code (thinking), then provide a structured assessment (output). Add <thinking-instructions> and <output-format>.

Create challenge-5-5.ts and run with: npx tsx challenge-5-5.ts

import { streamText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';

const CODE_TO_REVIEW = `
function processUsers(users: any[]) {
  let result = [];
  for (let i = 0; i < users.length; i++) {
    if (users[i].age > 18) {
      result.push({
        name: users[i].name.toUpperCase(),
        email: users[i].email,
        isAdmin: users[i].role == 'admin',
      });
    }
  }
  return result;
}
`;

const result = await streamText({
  model: anthropic('claude-sonnet-4-5-20250514'),
  prompt: `
<task-context>
You are a senior TypeScript engineer performing a code review.
Your goal is to identify issues and suggest improvements.
</task-context>

<background-data>
<code>
${CODE_TO_REVIEW}
</code>
</background-data>

<rules>
- Focus on type safety, readability, and best practices.
- Rate each issue as: critical, warning, or suggestion.
- Provide a concrete fix for each issue.
</rules>

<the-ask>
Review this code and provide actionable feedback.
</the-ask>

// TODO: Fuege <thinking-instructions> hinzu.
// Das LLM soll zuerst:
//   - Den Code Zeile fuer Zeile durchgehen
//   - Potentielle Probleme identifizieren
//   - Die Probleme nach Schweregrad sortieren
//   - Abhaengigkeiten zwischen Fixes beachten

// TODO: Fuege <output-format> hinzu.
// Zwei Abschnitte:
//   - <thinking> Block mit dem Analyseprozess
//   - Strukturierte Review-Ergebnisse (Issue, Severity, Fix)
  `.trim(),
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Checklist:

<thinking-instructions> describes the analysis steps
<output-format> separates reasoning from result
LLM output shows traceable intermediate steps
Result is based on the intermediate steps

Show solution

import { streamText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';

const CODE_TO_REVIEW = `
function processUsers(users: any[]) {
  let result = [];
  for (let i = 0; i < users.length; i++) {
    if (users[i].age > 18) {
      result.push({
        name: users[i].name.toUpperCase(),
        email: users[i].email,
        isAdmin: users[i].role == 'admin',
      });
    }
  }
  return result;
}
`;

const result = await streamText({
  model: anthropic('claude-sonnet-4-5-20250514'),
  prompt: `
<task-context>
You are a senior TypeScript engineer performing a code review.
Your goal is to identify issues and suggest improvements.
</task-context>

<background-data>
<code>
${CODE_TO_REVIEW}
</code>
</background-data>

<rules>
- Focus on type safety, readability, and best practices.
- Rate each issue as: critical, warning, or suggestion.
- Provide a concrete fix for each issue.
</rules>

<the-ask>
Review this code and provide actionable feedback.
</the-ask>

<thinking-instructions>
Before writing your review, analyze the code systematically:
1. Read the code line by line.
2. Identify all potential issues (type safety, logic, style).
3. For each issue, determine: What could go wrong in production?
4. Sort issues by severity (critical first).
5. Consider dependencies between fixes — does fixing one issue
   affect another?
6. Think about what the developer likely intended and how to
   preserve that intent while improving the code.
</thinking-instructions>

<output-format>
Return two sections:
- A <thinking> block containing your analysis process.
- The review results in markdown format (unwrapped, no <answer> tag).

For each issue in the review, use this structure:
### Issue: [title]
- **Severity:** critical | warning | suggestion
- **Line:** [line reference]
- **Problem:** [what is wrong]
- **Fix:** [concrete code change]
</output-format>
  `.trim(),
});

for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Explanation: The <thinking-instructions> guide the LLM through a systematic analysis process: first read, then identify problems, then prioritise. The <output-format> separates the reasoning process (<thinking> block) from the structured review results. This way you can trace why the LLM rated each issue the way it did — and the results are consistently formatted.

Expected output (approximate):

<thinking>
1. Line 1: `users: any[]` — no type safety...
2. Line 2: `let` instead of `const`...
3. Line 5: `==` instead of `===`...
</thinking>

### Issue: Missing Type Definition
- **Severity:** critical
- **Line:** 1
- **Problem:** `any[]` disables all type checking
- **Fix:** Define a `User` interface

COMBINE

PromptConfig (5.1), XML Structure (5.2), Exemplars (5.3), External Data (5.4) and Chain of Thought (5.5) all flow into buildFullPrompt, then into generateText to produce a structured answer

Exercise: Build a complete prompt that combines ALL 5 concepts of this level: Template (5.1) + XML Structure (5.2) + Exemplars (5.3) + RAG (5.4) + Chain of Thought (5.5). This is the preparation for the Boss Fight.

Specifically:

Create a buildFullPrompt function that accepts a config object with all parameters: role, audience, tone, sourceUrl, content, exemplars, thinkingInstructions
The prompt must contain all XML tags in the correct order: <task-context> -> <background-data> -> <examples> -> <rules> -> <the-ask> -> <thinking-instructions> -> <output-format>
Test with a concrete scenario: A technical assistant that analyses documentation and answers questions

Optional Stretch Goal: Make <thinking-instructions> optional — if not provided, the tag is omitted. For simple questions CoT is unnecessary.