Templates
Five templates from the curriculum. Copy them, adapt them, use them in your projects.
1. AI PRD Template
Section titled “1. AI PRD Template”Based on Chapter 8: Writing AI PRDs.
AI PRD: [Product Name / Feature]Date: [Date]Author: [Name]
---
1. PROBLEM STATEMENT & USER CONTEXT- Problem: [What is the problem? Quantify the manual effort.]- Target audience: [Who has this problem? How often?]- Current workaround: [How do users solve this today?]
2. AI APPROACH & RATIONALE- Why AI? [Why not rules/traditional code?]- Approach: [ ] LLM API [ ] RAG [ ] Fine-Tuning [ ] Agent Workflow- Rationale: [Why this approach?]- AI suitability (5 Check Questions): [ ] Does the use case tolerate occasional errors? [ ] Is there enough training data / context? [ ] Is the value over rules significant? [ ] Is cost per query economically viable? [ ] Is the risk of errors acceptable?
3. EVALUATION CRITERIA- Golden Dataset: [Size, source, labeling process]- Metrics + thresholds: | Metric | Minimum | Target | Measurement | |--------|---------|--------|-------------| | [e.g. Accuracy] | [e.g. 85%] | [e.g. 92%] | [e.g. Golden Dataset] | | [e.g. Hallucination Rate] | [e.g. <5%] | [e.g. <2%] | [e.g. LLM-as-Judge] | | [e.g. Latency P95] | [e.g. <3s] | [e.g. <1s] | [e.g. APM] | | [e.g. Cost/Query] | [e.g. <$0.05] | [e.g. <$0.02] | [e.g. Provider Dashboard] |
4. MODEL & INFRASTRUCTURE- Model: [e.g. Claude Sonnet 4.6 / GPT-4o-mini]- Rationale: [Cost/quality/latency tradeoff]- Expected volume: [Queries/day]- Cost projection: [$/month]
5. USER EXPERIENCE- AI output presentation: [e.g. Inline suggestion, separate panel, chat]- Confidence indicators: [ ] Yes [ ] No — Rationale: [...]- Fallback for low confidence: [e.g. manual review, disclaimer]- Feedback mechanism: [e.g. Thumbs up/down, Regenerate, Edit]
6. RISK & MITIGATION- Failure modes: | Failure Mode | Probability | Impact | Mitigation | |-------------|------------|--------|------------| | [e.g. Hallucination on domain terms] | [Medium] | [High] | [RAG with verified source] | | [e.g. Prompt injection] | [Low] | [High] | [Input validation + guardrails] |- Bias consideration: [Which groups could be disadvantaged?]- Privacy: [What data flows into the model? Tier 1/2/3?]
7. SUCCESS METRICS & ITERATION PLAN- Launch criteria: [Eval metrics above threshold + human review passed]- Post-launch monitoring: [Which metrics, what cadence?]- Improvement cadence: [e.g. prompt updates weekly, model upgrade quarterly]- Rollback trigger: [At what values do you roll back?]2. RICE-A Scoring
Section titled “2. RICE-A Scoring”Based on Chapter 2: Opportunity Identification. RICE extended with AI Complexity.
RICE-A SCORING: [Project / Feature List]Date: [Date]
| Feature | Reach | Impact | Confidence | Effort | AI Complexity | RICE-A Score ||---------|-------|--------|------------|--------|--------------|-------------|| [Feature A] | [1-10] | [1-5] | [0.5-1.0] | [1-10] | [1-5] | [calculated] || [Feature B] | [1-10] | [1-5] | [0.5-1.0] | [1-10] | [1-5] | [calculated] || [Feature C] | [1-10] | [1-5] | [0.5-1.0] | [1-10] | [1-5] | [calculated] |
Formula: (Reach x Impact x Confidence) / (Effort + AI Complexity x 0.5)
SCALES:
Reach (1-10): 1-3: <1,000 users affected 4-6: 1,000-10,000 users 7-10: >10,000 users
Impact (1-5): 1: Minimal — nice-to-have 2: Low — saves some time 3: Medium — noticeable improvement 4: High — solves a real problem 5: Massive — game changer
Confidence (0.5-1.0): 0.5: Gut feeling, no data 0.7: Some user signals 0.8: Solid evidence 1.0: Validated by prototype/pilot
Effort (1-10): 1-3: <2 weeks, 1-2 people 4-6: 2-6 weeks, small team 7-10: >6 weeks, cross-functional
AI Complexity (1-5): 1: Low — clear use case, good data, standard model 2: Moderate — data available but needs preparation 3: Medium — custom eval needed, data quality unclear 4: High — model risk, extensive evaluation required 5: Very high — unsolved problem, significant research needed3. Red Team Plan
Section titled “3. Red Team Plan”Based on Chapter 5: Red Teaming.
RED TEAM PLAN: [Product Name / Feature]Date: [Date]Test window: [Start — End]Owner: [Name/Team]
SCOPE- System under test: [e.g. Support chatbot, content generator]- Model: [e.g. Claude Sonnet 4.6]- Access path: [e.g. Web UI, API, Mobile App]
PRIORITY MATRIX| Category | Priority | Test Cases | Status ||----------|----------|------------|--------|| Prompt Injection (direct) | Critical | [e.g. "Ignore all previous instructions"] | [ ] Open || Prompt Injection (indirect) | Critical | [e.g. Malicious content in uploads/URLs] | [ ] Open || Data Extraction | High | [e.g. Extract system prompt, read PII] | [ ] Open || Jailbreaking | High | [e.g. Roleplay attacks, multi-turn manipulation] | [ ] Open || Bias / Fairness | High | [e.g. Compare results by gender, ethnicity, age] | [ ] Open || Hallucination | Medium | [e.g. Ask for facts not in context] | [ ] Open || Edge Cases | Medium | [e.g. Empty input, very long input, different language] | [ ] Open || Misuse | Medium | [e.g. Generate harmful content] | [ ] Open |
TEST PROTOCOL PER FINDING- Finding ID: [RT-001]- Category: [e.g. Prompt Injection]- Severity: [Critical / High / Medium / Low]- Input: [Exact prompt]- Expected behavior: [What should happen]- Actual behavior: [What happened]- Reproducible: [ ] Yes [ ] No [ ] Partially- Recommendation: [Fix / Accepted risk / Monitoring]
RESULTS SUMMARY| Severity | Found | Fixed | Accepted | Open ||----------|-------|-------|----------|------|| Critical | [n] | [n] | [n] | [n] || High | [n] | [n] | [n] | [n] || Medium | [n] | [n] | [n] | [n] || Low | [n] | [n] | [n] | [n] |
SHIP DECISION: [ ] Ship [ ] Fix First [ ] No-ShipRationale: [...]4. Ship/No-Ship Checklist
Section titled “4. Ship/No-Ship Checklist”Based on Chapter 5: Ship/No-Ship Decisions.
SHIP/NO-SHIP CHECKLIST: [Feature]Date: [Date]Decision maker: [Name]
EVALUATION[ ] Golden Dataset defined and current[ ] All core metrics above minimum threshold Accuracy: [current] vs. [minimum] Hallucination Rate: [current] vs. [maximum] Latency P95: [current] vs. [maximum][ ] Performance checked on relevant subgroups (no aggregation problem)
RED TEAMING[ ] Red team completed (date: [...])[ ] No open findings with severity "Critical"[ ] Open "High" findings documented with mitigation
BIAS & FAIRNESS[ ] Metrics disaggregated by relevant groups[ ] No significant performance differences between groups[ ] Fairness decision documented (which metric prioritized?)
ROLLBACK[ ] Feature flag / kill switch in place[ ] Rollback possible in <1h[ ] Rollback triggers defined (at what values?)
MONITORING[ ] Real-time monitoring for core metrics set up[ ] Alerting configured (thresholds + recipients)[ ] Feedback loop for user reports in place
COMPLIANCE[ ] Privacy review completed[ ] EU AI Act risk category determined[ ] Guardrails implemented and tested[ ] Documentation requirements fulfilled
DECISION[ ] SHIP — all checks passed[ ] SHIP WITH CONSTRAINTS — [which and why][ ] FIX FIRST — [what needs to happen][ ] NO-SHIP — [rationale]
Signature: __________ Date: __________5. AI KPI Dashboard
Section titled “5. AI KPI Dashboard”Based on Chapter 9: KPIs for AI Products.
AI KPI DASHBOARD: [Product Name]Date: [Date]Reporting cadence: [weekly / monthly]
--- QUALITY METRICS (Technical Quality) ---
| Metric | Current | Target | Trend | Alert at ||--------|---------|--------|-------|----------|| Hallucination Rate | [%] | [<5%] | [up/down/stable] | [>8%] || Groundedness Score | [%] | [>90%] | [up/down/stable] | [<85%] || Task Completion Rate | [%] | [>80%] | [up/down/stable] | [<70%] || Latency P95 | [ms] | [<2000ms] | [up/down/stable] | [>3000ms] || Eval Score (Golden Dataset) | [%] | [>85%] | [up/down/stable] | [<80%] |
--- BUSINESS METRICS (Business Impact) ---
| Metric | Current | Target | Trend | Alert at ||--------|---------|--------|-------|----------|| AI Feature Adoption Rate | [%] | [>30%] | [up/down/stable] | [<15%] || Regeneration Rate | [%] | [<20%] | [up/down/stable] | [>35%] || Revenue Impact | [$] | [...] | [up/down/stable] | [...] || Cost per Query | [$] | [<$0.05] | [up/down/stable] | [>$0.10] || User Retention (AI vs. Non-AI) | [%] | [...] | [up/down/stable] | [...] |
--- OPERATIONAL METRICS (Operations) ---
| Metric | Current | Target | Trend | Alert at ||--------|---------|--------|-------|----------|| Error Rate | [%] | [<1%] | [up/down/stable] | [>3%] || API Availability | [%] | [>99.5%] | [up/down/stable] | [<99%] || Monthly AI Cost | [$] | [...] | [up/down/stable] | [...] || Cost Trend (MoM) | [%] | [stable] | [up/down/stable] | [>+20%] || Guardrail Trigger Rate | [%] | [<5%] | [up/down/stable] | [>10%] |
--- LEADING INDICATORS ---
Regeneration Rate is the most important leading indicator:- Rising → Users are unhappy with AI output → Quality problem- Falling → AI output is being accepted → Quality is improving- Correlates with churn? → If yes: Quality is directly revenue-relevant
--- ACTIONS ---
| Metric in alarm | Root cause | Action | Owner | Deadline ||----------------|-----------|--------|-------|----------|| [...] | [...] | [...] | [...] | [...] |