Boss Fight: Token Budget Calculator
The Scenario
Section titled “The Scenario”You’re building a Token Budget Calculator — a tool that checks before every LLM call whether the request fits within budget. The calculator combines all four building blocks from Level 2: token counting, cost calculation, Context Window check, and cache tracking.
Your tool should feel like this:
[Budget Check] System: ~580 Tokens | User: ~15 Tokens | Output reserve: 4096[Budget Check] Total: ~4691 / 200000 Tokens (2.3%)[Budget Check] Status: OK
[Cost] Call 1: $0.001245 (Cache: MISS)[Cost] Call 2: $0.000387 (Cache: HIT — 69% cheaper)[Cost] Call 3: $0.000391 (Cache: HIT — 69% cheaper)
[Session] 3 Calls | 892 Tokens | $0.002023 | Cache hit rate: 66.7%This project connects all four building blocks:
Requirements
Section titled “Requirements”-
Token Counting (Challenge 2.1) — Implement
estimateTokens(text)that calculates the approximate token count of a text. Use the rule of thumb (1 token ≈ 3.5 characters). -
Context Window Check (Challenge 2.3) — Before each call: Calculate System Prompt + User Prompt + output reserve. Check against the model’s Context Window limit. If >90%: warning. If >100%: abort with error message.
-
Cost Calculation (Challenge 2.2) — After each call: Read
result.usageand calculate costs based on the model pricing table. Track input and output costs separately. -
Cache Tracking (Challenge 2.4) — Track across multiple calls whether the System Prompt is being cached. Calculate the cache hit rate and the theoretical savings compared to “no caching.”
-
Session Report — After all calls: Show a summary with total number of calls, token usage, costs, and cache hit rate.
Starter Code
Section titled “Starter Code”Create a file boss-fight-2.ts:
import { generateText } from 'ai';import { anthropic } from '@ai-sdk/anthropic';
// TODO: Define ModelPricing interface and PRICING table
// TODO: Implement estimateTokens(text: string): number
// TODO: Implement checkContextWindow(systemPrompt, userPrompt, modelContextWindow, outputReserve)// - Calculate estimated tokens for system + user + reserve// - Return: { totalEstimate, utilization, status: 'ok' | 'warning' | 'error' }
// TODO: Implement calculateCost(usage, modelId)// - Calculate input and output costs separately// - Return: { inputCost, outputCost, totalCost }
// TODO: Define session tracking variables// let session = { calls: 0, totalTokens: 0, totalCost: 0, cacheHits: 0 };
// TODO: Implement budgetedGenerate(systemPrompt, userPrompt, modelId)// - 1. Context Window check// - 2. generateText call// - 3. Calculate costs// - 4. Track cache status// - 5. Update session variables// - 6. Return: result + cost + cacheStatus
// TODO: Define a long System Prompt (>1024 tokens)
// TODO: Execute 3+ calls with the same System Prompt
// TODO: Display the Session ReportRun with:
npx tsx boss-fight-2.tsEvaluation Criteria
Section titled “Evaluation Criteria”Your Boss Fight is passed when:
-
estimateTokenscalculates the approximate token count of a text - Before each call, it checks whether the input fits into the Context Window
- A warning is shown at >90% Context Window utilization
- Costs are calculated after each call based on
result.usage - Input and output costs are shown separately
- Cache hit/miss is logged per call
- The cache hit rate is calculated across the session
- A final session report shows: calls, tokens, costs, cache hit rate
Hint 1: Improving token estimation
The simplest approach is Math.ceil(text.length / 3.5). For better accuracy you can use different factors for different text types: code has more special characters (their own tokens), German text requires more tokens than English. A weighted estimate could count words and assume 1.3 tokens per word.
Hint 2: Cache detection
Anthropic supports automatic Prompt Caching. The cache status can be detected most easily indirectly: if the second call with an identical System Prompt has lower effective costs, the cache was hit. Alternatively, you can simply track: first call = cache miss (write), subsequent calls with identical prefix = cache hit (read).
Hint 3: Session tracking structure
A simple object is sufficient for session tracking:
interface SessionStats { calls: number; totalTokens: number; totalCost: number; cacheHits: number; cacheMisses: number;}Update it after each generateText call (with generateText, result.usage is available directly after the await). The cache hit rate is cacheHits / (cacheHits + cacheMisses) * 100.