05 — Evaluation

Illustration: Evaluation — measuring AI product quality

With deterministic software, a test is either green or red. With AI products, the answer is: “It depends.” Evaluation is the discipline that turns this “it depends” into measurable decisions.

What you’ll learn

How evaluation frameworks work (Golden Datasets, LLM-as-Judge)
Which metrics matter for AI products
How Red Teaming finds weaknesses before users do
When an AI feature is good enough to ship
How to detect and address bias

Lessons

Eval Frameworks
Metrics
Red Teaming
Ship/No-Ship
Bias
Synthesis