05 — Evaluation
With deterministic software, a test is either green or red. With AI products, the answer is: “It depends.” Evaluation is the discipline that turns this “it depends” into measurable decisions.
What you’ll learn
Section titled “What you’ll learn”- How evaluation frameworks work (Golden Datasets, LLM-as-Judge)
- Which metrics matter for AI products
- How Red Teaming finds weaknesses before users do
- When an AI feature is good enough to ship
- How to detect and address bias