Challenge 9.2: Model Router
Would you use the same model for the question “What time is it?” as for “Analyze this 500-line code and find the performance bug”? What does it cost if you always use the most expensive model?
OVERVIEW
Section titled “OVERVIEW”The input is analyzed and classified. Simple tasks go to a fast, cheap flash model. Complex tasks to a powerful pro model. Code tasks to a specialized code model. Result: optimal price-performance ratio.
Without Model Routing: You use an expensive model for everything. Simple questions like “What is TypeScript?” cost as much as complex analyses. At 10,000 requests per day, costs add up quickly. Or you use a cheap model for everything — then quality suffers on complex tasks.
With Model Routing: Each task gets the right model. 80% of requests are simple and go to the flash model (10-50x cheaper). 20% are complex and get the pro model. Result: same quality, drastically lower costs.
WALKTHROUGH
Section titled “WALKTHROUGH”Layer 1: Simple Router by Task Type
Section titled “Layer 1: Simple Router by Task Type”The simplest variant — a function that selects the model based on task type:
import { anthropic } from '@ai-sdk/anthropic';import { google } from '@ai-sdk/google';import { generateText } from 'ai';
function selectModel(task: 'simple' | 'complex' | 'code') { switch (task) { case 'simple': return google('gemini-2.5-flash-lite'); // ← Cheap + fast case 'complex': return anthropic('claude-opus-4-6'); // ← Expensive + powerful case 'code': return anthropic('claude-sonnet-4-5-20250514'); // ← Good price-performance ratio for code }}
// Usageconst result = await generateText({ model: selectModel('simple'), prompt: 'Was ist TypeScript?',});console.log(result.text);The caller explicitly decides which type applies. Simple, but limited — the human has to classify.
Layer 2: Dynamic Routing by Input Length
Section titled “Layer 2: Dynamic Routing by Input Length”Instead of manual classification: automatically decide based on token count:
import { anthropic } from '@ai-sdk/anthropic';import { google } from '@ai-sdk/google';import { generateText } from 'ai';
function estimateTokens(text: string): number { return Math.ceil(text.split(/\s+/).length * 1.3); // ← Rough estimate: words * 1.3}
function selectByComplexity(input: string) { const tokens = estimateTokens(input); console.log(`Estimated tokens: ${tokens}`);
if (tokens < 50) { console.log('Route: Flash (simple request)'); return google('gemini-2.5-flash-lite'); // ← Short requests = simple } if (tokens < 200) { console.log('Route: Sonnet (medium request)'); return anthropic('claude-sonnet-4-5-20250514'); // ← Medium requests = standard } console.log('Route: Opus (complex request)'); return anthropic('claude-opus-4-6'); // ← Long requests = complex}
// Testconst simpleResult = await generateText({ model: selectByComplexity('Was ist TypeScript?'), prompt: 'Was ist TypeScript?',});console.log(simpleResult.text);Token-based routing is a good first step — long prompts often indicate complex tasks. But length alone is not a perfect proxy for complexity.
Layer 3: LLM-Based Routing
Section titled “Layer 3: LLM-Based Routing”The cleverest variant — a small, fast model classifies the task, and the result determines the model:
import { anthropic } from '@ai-sdk/anthropic';import { google } from '@ai-sdk/google';import { generateText, Output } from 'ai';import { z } from 'zod';
// Step 1: Classification with a small modelasync function classifyTask(input: string) { const result = await generateText({ model: google('gemini-2.5-flash-lite'), // ← Small model for classification system: `Klassifiziere die folgende Aufgabe in eine Kategorie.- "simple": Einfache Fragen, Definitionen, kurze Antworten- "complex": Analyse, Vergleich, Argumentation, kreatives Schreiben- "code": Code schreiben, debuggen, reviewen, erklaeren`, prompt: input, output: Output.enum(['simple', 'complex', 'code']), // ← From Level 1.5 }); return result.output;}
// Step 2: Select model based on classificationfunction selectByClassification(classification: 'simple' | 'complex' | 'code') { switch (classification) { case 'simple': return google('gemini-2.5-flash-lite'); case 'complex': return anthropic('claude-opus-4-6'); case 'code': return anthropic('claude-sonnet-4-5-20250514'); }}
// Step 3: Routing pipelineasync function routedGenerate(input: string) { const taskType = await classifyTask(input); // ← LLM classifies console.log(`Classification: ${taskType}`);
const model = selectByClassification(taskType); // ← Model is selected const result = await generateText({ model, prompt: input });
console.log(`Tokens: ${result.usage.totalTokens}`); return result.text;}
// Testconst answer = await routedGenerate( 'Vergleiche die Vor- und Nachteile von REST vs. GraphQL fuer eine Microservices-Architektur.',);console.log(answer);Two LLM calls instead of one — but the first (classification) is extremely cheap and fast. The cost of classification is negligible compared to the savings when 80% of requests go to the flash model.
Layer 4: Cost Comparison
Section titled “Layer 4: Cost Comparison”What does Model Routing actually save? An overview of model costs:
| Model | Input (per 1M Tokens) | Output (per 1M Tokens) | Strength |
|---|---|---|---|
| Gemini 2.5 Flash Lite | ~$0.075 | ~$0.30 | Simple tasks |
| Claude Sonnet 4.5 | ~$3.00 | ~$15.00 | All-rounder |
| Claude Opus 4.6 | ~$15.00 | ~$75.00 | Complex analysis |
Example calculation: 10,000 requests per day, averaging 500 input + 1,000 output tokens.
- Without routing (all Opus): ~$900/day
- With routing (80% Flash, 15% Sonnet, 5% Opus): ~$55/day
That’s a savings of over 90% — with comparable quality, because simple requests don’t need Opus.
Task: Build a model router that classifies input and routes to the appropriate model.
Create model-router.ts and run with npx tsx model-router.ts.
import { anthropic } from '@ai-sdk/anthropic';import { google } from '@ai-sdk/google';import { generateText, Output } from 'ai';import { z } from 'zod';
// TODO 1: Implement classifyTask(input: string)// - Use a small model (gemini-2.5-flash-lite)// - Classify into 'simple', 'complex', or 'code'// - Use Output.enum for type-safe output
// TODO 2: Implement selectModel(classification)// - 'simple' → google('gemini-2.5-flash-lite')// - 'complex' → anthropic('claude-opus-4-6')// - 'code' → anthropic('claude-sonnet-4-5-20250514')
// TODO 3: Build a routedGenerate function that:// - First classifies// - Then calls the appropriate model// - Logs the token usage
// TODO 4: Test with these inputs:// - 'Was ist eine Variable?' (→ simple)// - 'Vergleiche SQL vs. NoSQL Datenbanken' (→ complex)// - 'Schreibe eine Funktion die Arrays sortiert' (→ code)Checklist:
- Classification with a small model implemented
-
Output.enumused for type-safe classification - Three different models depending on task type
- Token usage is logged
- Correct assignment for the three test inputs
Show solution
import { anthropic } from '@ai-sdk/anthropic';import { google } from '@ai-sdk/google';import { generateText, Output } from 'ai';import { z } from 'zod';
async function classifyTask(input: string) { const result = await generateText({ model: google('gemini-2.5-flash-lite'), system: `Klassifiziere die Aufgabe:- "simple": Definitionen, kurze Fragen, Fakten- "complex": Analyse, Vergleich, Argumentation- "code": Code schreiben, debuggen, erklaeren`, prompt: input, output: Output.enum(['simple', 'complex', 'code']), }); return result.output;}
function selectModel(classification: 'simple' | 'complex' | 'code') { const models = { simple: google('gemini-2.5-flash-lite'), complex: anthropic('claude-opus-4-6'), code: anthropic('claude-sonnet-4-5-20250514'), }; return models[classification];}
async function routedGenerate(input: string) { // Classification const taskType = await classifyTask(input); console.log(`Classification: ${taskType}`);
// Routing const model = selectModel(taskType); const result = await generateText({ model, prompt: input });
console.log(`Tokens: ${result.usage.totalTokens}`); return { text: result.text, taskType, tokens: result.usage.totalTokens };}
// Testsconst testInputs = [ 'Was ist eine Variable?', 'Vergleiche SQL vs. NoSQL Datenbanken fuer eine E-Commerce-Plattform', 'Schreibe eine TypeScript-Funktion die ein Array von Zahlen sortiert',];
for (const input of testInputs) { console.log(`\n--- Input: "${input}" ---`); const result = await routedGenerate(input); console.log(`Response: ${result.text.slice(0, 100)}...`);}Explanation: Two LLM calls per request — the first (classification) costs almost nothing with Flash Lite. The second call goes to the appropriate model. The Output.enum function from Level 1.5 guarantees that the classification is always one of the three allowed values.
Expected output (approximate):--- Input: "Was ist eine Variable?" ---Classification: simpleTokens: 127Response: Eine Variable ist ein benannter Speicherplatz...
--- Input: "Vergleiche SQL vs. NoSQL Datenbanken fuer eine E-Commerce-Plattform" ---Classification: complexTokens: 891Response: SQL-Datenbanken bieten ACID-Garantien...
--- Input: "Schreibe eine TypeScript-Funktion die ein Array von Zahlen sortiert" ---Classification: codeTokens: 423Response: function sortNumbers(arr: number[]): number[]...COMBINE
Section titled “COMBINE”Exercise: Combine the Model Router with Usage Tracking from Level 2.2. Build a system that:
- Classifies — every request is categorized as
simple,complex, orcode - Routes — the appropriate model is selected
- Tracks — token usage and estimated costs are logged per request
- Compares — after 10 requests: How much would you have paid with “all Opus” vs. with routing?
Calculate estimated costs with these rates (simplified):
- Flash Lite: $0.0004 per 1K Tokens
- Sonnet: $0.009 per 1K Tokens
- Opus: $0.045 per 1K Tokens
Optional Stretch Goal: Add a fallback — if the selected model returns an error (e.g., rate limit), automatically escalate to the next larger model.