Skip to content
EN DE

05 — Evaluation

With deterministic software, a test is either green or red. With AI products, the answer is: “It depends.” Evaluation is the discipline that turns this “it depends” into measurable decisions.

  • How evaluation frameworks work (Golden Datasets, LLM-as-Judge)
  • Which metrics matter for AI products
  • How Red Teaming finds weaknesses before users do
  • When an AI feature is good enough to ship
  • How to detect and address bias
  1. Eval Frameworks
  2. Metrics
  3. Red Teaming
  4. Ship/No-Ship
  5. Bias
  6. Synthesis

Part of AI Learning — free courses from prompt to production. Jan on LinkedIn