Skip to content
EN DE

Challenge 9.2: Model Router

Would you use the same model for the question “What time is it?” as for “Analyze this 500-line code and find the performance bug”? What does it cost if you always use the most expensive model?

Overview: User Input to Analyze Input, then to Flash Model (Simple), Pro Model (Complex), or Code Model (Code), all to Output

The input is analyzed and classified. Simple tasks go to a fast, cheap flash model. Complex tasks to a powerful pro model. Code tasks to a specialized code model. Result: optimal price-performance ratio.

Without Model Routing: You use an expensive model for everything. Simple questions like “What is TypeScript?” cost as much as complex analyses. At 10,000 requests per day, costs add up quickly. Or you use a cheap model for everything — then quality suffers on complex tasks.

With Model Routing: Each task gets the right model. 80% of requests are simple and go to the flash model (10-50x cheaper). 20% are complex and get the pro model. Result: same quality, drastically lower costs.

The simplest variant — a function that selects the model based on task type:

import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';
import { generateText } from 'ai';
function selectModel(task: 'simple' | 'complex' | 'code') {
switch (task) {
case 'simple':
return google('gemini-2.5-flash-lite'); // ← Cheap + fast
case 'complex':
return anthropic('claude-opus-4-6'); // ← Expensive + powerful
case 'code':
return anthropic('claude-sonnet-4-5-20250514'); // ← Good price-performance ratio for code
}
}
// Usage
const result = await generateText({
model: selectModel('simple'),
prompt: 'Was ist TypeScript?',
});
console.log(result.text);

The caller explicitly decides which type applies. Simple, but limited — the human has to classify.

Instead of manual classification: automatically decide based on token count:

import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';
import { generateText } from 'ai';
function estimateTokens(text: string): number {
return Math.ceil(text.split(/\s+/).length * 1.3); // ← Rough estimate: words * 1.3
}
function selectByComplexity(input: string) {
const tokens = estimateTokens(input);
console.log(`Estimated tokens: ${tokens}`);
if (tokens < 50) {
console.log('Route: Flash (simple request)');
return google('gemini-2.5-flash-lite'); // ← Short requests = simple
}
if (tokens < 200) {
console.log('Route: Sonnet (medium request)');
return anthropic('claude-sonnet-4-5-20250514'); // ← Medium requests = standard
}
console.log('Route: Opus (complex request)');
return anthropic('claude-opus-4-6'); // ← Long requests = complex
}
// Test
const simpleResult = await generateText({
model: selectByComplexity('Was ist TypeScript?'),
prompt: 'Was ist TypeScript?',
});
console.log(simpleResult.text);

Token-based routing is a good first step — long prompts often indicate complex tasks. But length alone is not a perfect proxy for complexity.

The cleverest variant — a small, fast model classifies the task, and the result determines the model:

import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';
import { generateText, Output } from 'ai';
import { z } from 'zod';
// Step 1: Classification with a small model
async function classifyTask(input: string) {
const result = await generateText({
model: google('gemini-2.5-flash-lite'), // ← Small model for classification
system: `Klassifiziere die folgende Aufgabe in eine Kategorie.
- "simple": Einfache Fragen, Definitionen, kurze Antworten
- "complex": Analyse, Vergleich, Argumentation, kreatives Schreiben
- "code": Code schreiben, debuggen, reviewen, erklaeren`,
prompt: input,
output: Output.enum(['simple', 'complex', 'code']), // ← From Level 1.5
});
return result.output;
}
// Step 2: Select model based on classification
function selectByClassification(classification: 'simple' | 'complex' | 'code') {
switch (classification) {
case 'simple': return google('gemini-2.5-flash-lite');
case 'complex': return anthropic('claude-opus-4-6');
case 'code': return anthropic('claude-sonnet-4-5-20250514');
}
}
// Step 3: Routing pipeline
async function routedGenerate(input: string) {
const taskType = await classifyTask(input); // ← LLM classifies
console.log(`Classification: ${taskType}`);
const model = selectByClassification(taskType); // ← Model is selected
const result = await generateText({ model, prompt: input });
console.log(`Tokens: ${result.usage.totalTokens}`);
return result.text;
}
// Test
const answer = await routedGenerate(
'Vergleiche die Vor- und Nachteile von REST vs. GraphQL fuer eine Microservices-Architektur.',
);
console.log(answer);

Two LLM calls instead of one — but the first (classification) is extremely cheap and fast. The cost of classification is negligible compared to the savings when 80% of requests go to the flash model.

What does Model Routing actually save? An overview of model costs:

ModelInput (per 1M Tokens)Output (per 1M Tokens)Strength
Gemini 2.5 Flash Lite~$0.075~$0.30Simple tasks
Claude Sonnet 4.5~$3.00~$15.00All-rounder
Claude Opus 4.6~$15.00~$75.00Complex analysis

Example calculation: 10,000 requests per day, averaging 500 input + 1,000 output tokens.

  • Without routing (all Opus): ~$900/day
  • With routing (80% Flash, 15% Sonnet, 5% Opus): ~$55/day

That’s a savings of over 90% — with comparable quality, because simple requests don’t need Opus.

Task: Build a model router that classifies input and routes to the appropriate model.

Create model-router.ts and run with npx tsx model-router.ts.

import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';
import { generateText, Output } from 'ai';
import { z } from 'zod';
// TODO 1: Implement classifyTask(input: string)
// - Use a small model (gemini-2.5-flash-lite)
// - Classify into 'simple', 'complex', or 'code'
// - Use Output.enum for type-safe output
// TODO 2: Implement selectModel(classification)
// - 'simple' → google('gemini-2.5-flash-lite')
// - 'complex' → anthropic('claude-opus-4-6')
// - 'code' → anthropic('claude-sonnet-4-5-20250514')
// TODO 3: Build a routedGenerate function that:
// - First classifies
// - Then calls the appropriate model
// - Logs the token usage
// TODO 4: Test with these inputs:
// - 'Was ist eine Variable?' (→ simple)
// - 'Vergleiche SQL vs. NoSQL Datenbanken' (→ complex)
// - 'Schreibe eine Funktion die Arrays sortiert' (→ code)

Checklist:

  • Classification with a small model implemented
  • Output.enum used for type-safe classification
  • Three different models depending on task type
  • Token usage is logged
  • Correct assignment for the three test inputs
Show solution
import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';
import { generateText, Output } from 'ai';
import { z } from 'zod';
async function classifyTask(input: string) {
const result = await generateText({
model: google('gemini-2.5-flash-lite'),
system: `Klassifiziere die Aufgabe:
- "simple": Definitionen, kurze Fragen, Fakten
- "complex": Analyse, Vergleich, Argumentation
- "code": Code schreiben, debuggen, erklaeren`,
prompt: input,
output: Output.enum(['simple', 'complex', 'code']),
});
return result.output;
}
function selectModel(classification: 'simple' | 'complex' | 'code') {
const models = {
simple: google('gemini-2.5-flash-lite'),
complex: anthropic('claude-opus-4-6'),
code: anthropic('claude-sonnet-4-5-20250514'),
};
return models[classification];
}
async function routedGenerate(input: string) {
// Classification
const taskType = await classifyTask(input);
console.log(`Classification: ${taskType}`);
// Routing
const model = selectModel(taskType);
const result = await generateText({ model, prompt: input });
console.log(`Tokens: ${result.usage.totalTokens}`);
return { text: result.text, taskType, tokens: result.usage.totalTokens };
}
// Tests
const testInputs = [
'Was ist eine Variable?',
'Vergleiche SQL vs. NoSQL Datenbanken fuer eine E-Commerce-Plattform',
'Schreibe eine TypeScript-Funktion die ein Array von Zahlen sortiert',
];
for (const input of testInputs) {
console.log(`\n--- Input: "${input}" ---`);
const result = await routedGenerate(input);
console.log(`Response: ${result.text.slice(0, 100)}...`);
}

Explanation: Two LLM calls per request — the first (classification) costs almost nothing with Flash Lite. The second call goes to the appropriate model. The Output.enum function from Level 1.5 guarantees that the classification is always one of the three allowed values.

Expected output (approximate):
--- Input: "Was ist eine Variable?" ---
Classification: simple
Tokens: 127
Response: Eine Variable ist ein benannter Speicherplatz...
--- Input: "Vergleiche SQL vs. NoSQL Datenbanken fuer eine E-Commerce-Plattform" ---
Classification: complex
Tokens: 891
Response: SQL-Datenbanken bieten ACID-Garantien...
--- Input: "Schreibe eine TypeScript-Funktion die ein Array von Zahlen sortiert" ---
Classification: code
Tokens: 423
Response: function sortNumbers(arr: number[]): number[]...
Combine: User Input to Model Router to Selected Model to Result to Usage Tracking to Cost Log

Exercise: Combine the Model Router with Usage Tracking from Level 2.2. Build a system that:

  1. Classifies — every request is categorized as simple, complex, or code
  2. Routes — the appropriate model is selected
  3. Tracks — token usage and estimated costs are logged per request
  4. Compares — after 10 requests: How much would you have paid with “all Opus” vs. with routing?

Calculate estimated costs with these rates (simplified):

  • Flash Lite: $0.0004 per 1K Tokens
  • Sonnet: $0.009 per 1K Tokens
  • Opus: $0.045 per 1K Tokens

Optional Stretch Goal: Add a fallback — if the selected model returns an error (e.g., rate limit), automatically escalate to the next larger model.

Part of AI Learning — free courses from prompt to production. Jan on LinkedIn