Boss Fight: Token Budget Calculator

The Scenario

You’re building a Token Budget Calculator — a tool that checks before every LLM call whether the request fits within budget. The calculator combines all four building blocks from Level 2: token counting, cost calculation, Context Window check, and cache tracking.

Your tool should feel like this:

[Budget Check] System: ~580 Tokens | User: ~15 Tokens | Output reserve: 4096
[Budget Check] Total: ~4691 / 200000 Tokens (2.3%)
[Budget Check] Status: OK

[Cost] Call 1: $0.001245 (Cache: MISS)
[Cost] Call 2: $0.000387 (Cache: HIT — 69% cheaper)
[Cost] Call 3: $0.000391 (Cache: HIT — 69% cheaper)

[Session] 3 Calls | 892 Tokens | $0.002023 | Cache hit rate: 66.7%

This project connects all four building blocks:

System and user prompt flow to token counting, context window check, generateText, usage tracking, cache tracking, session report

Requirements

Token Counting (Challenge 2.1) — Implement estimateTokens(text) that calculates the approximate token count of a text. Use the rule of thumb (1 token ≈ 3.5 characters).
Context Window Check (Challenge 2.3) — Before each call: Calculate System Prompt + User Prompt + output reserve. Check against the model’s Context Window limit. If >90%: warning. If >100%: abort with error message.
Cost Calculation (Challenge 2.2) — After each call: Read result.usage and calculate costs based on the model pricing table. Track input and output costs separately.
Cache Tracking (Challenge 2.4) — Track across multiple calls whether the System Prompt is being cached. Calculate the cache hit rate and the theoretical savings compared to “no caching.”
Session Report — After all calls: Show a summary with total number of calls, token usage, costs, and cache hit rate.

Starter Code

Create a file boss-fight-2.ts:

import { generateText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';

// TODO: Define ModelPricing interface and PRICING table

// TODO: Implement estimateTokens(text: string): number

// TODO: Implement checkContextWindow(systemPrompt, userPrompt, modelContextWindow, outputReserve)
//   - Calculate estimated tokens for system + user + reserve
//   - Return: { totalEstimate, utilization, status: 'ok' | 'warning' | 'error' }

// TODO: Implement calculateCost(usage, modelId)
//   - Calculate input and output costs separately
//   - Return: { inputCost, outputCost, totalCost }

// TODO: Define session tracking variables
// let session = { calls: 0, totalTokens: 0, totalCost: 0, cacheHits: 0 };

// TODO: Implement budgetedGenerate(systemPrompt, userPrompt, modelId)
//   - 1. Context Window check
//   - 2. generateText call
//   - 3. Calculate costs
//   - 4. Track cache status
//   - 5. Update session variables
//   - 6. Return: result + cost + cacheStatus

// TODO: Define a long System Prompt (>1024 tokens)

// TODO: Execute 3+ calls with the same System Prompt

// TODO: Display the Session Report

Run with:

npx tsx boss-fight-2.ts

Evaluation Criteria

Your Boss Fight is passed when:

estimateTokens calculates the approximate token count of a text
Before each call, it checks whether the input fits into the Context Window
A warning is shown at >90% Context Window utilization
Costs are calculated after each call based on result.usage
Input and output costs are shown separately
Cache hit/miss is logged per call
The cache hit rate is calculated across the session
A final session report shows: calls, tokens, costs, cache hit rate

Hints

Hint 1: Improving token estimation

The simplest approach is Math.ceil(text.length / 3.5). For better accuracy you can use different factors for different text types: code has more special characters (their own tokens), German text requires more tokens than English. A weighted estimate could count words and assume 1.3 tokens per word.

Hint 2: Cache detection

Anthropic supports automatic Prompt Caching. The cache status can be detected most easily indirectly: if the second call with an identical System Prompt has lower effective costs, the cache was hit. Alternatively, you can simply track: first call = cache miss (write), subsequent calls with identical prefix = cache hit (read).

Hint 3: Session tracking structure

A simple object is sufficient for session tracking:

interface SessionStats {
  calls: number;
  totalTokens: number;
  totalCost: number;
  cacheHits: number;
  cacheMisses: number;
}

Update it after each generateText call (with generateText, result.usage is available directly after the await). The cache hit rate is cacheHits / (cacheHits + cacheMisses) * 100.