Skip to content
EN DE

Calibrating Trust

L4 Lesson 4 of 5 — AI as Coworker
1
2
3
4
5

You delegate a task to AI. It delivers a result. Now the question: Is it correct?

Blind trust is dangerous. Checking everything manually makes delegation pointless. The answer lies in between — and finding that precise in-between is the core skill of L4.

The numbers are stark:

  • 47% of enterprise AI users made at least one major business decision based on hallucinated content (Deloitte Global AI Survey, 2025)
  • The Harvard/BCG study showed: On tasks outside AI strengths, quality dropped — because consultants trusted the output without checking
  • Air Canada was held liable because their chatbot gave a customer false information about refund policies

The problem isn’t that AI hallucinates. The problem is that people follow the output uncritically. The EU AI Act names this explicitly: Automation Bias The tendency to trust automated systems more than your own judgment — even when there are signs the system is wrong. The EU AI Act recognizes automation bias as an explicit risk. — the tendency to believe machine-generated results over your own judgment.

Imagine you’re not delegating to “AI” but to a new team member. How much oversight do you provide? It depends on their track record — and on the task.

Trust LevelAnalogyWhat AI May DoHow You Verify
InternFirst dayObserve, read, summarizeCheck everything
Junior3 months inMake suggestions, create draftsReview every result
SeniorProven track recordExecute independently with monitoringSpot checks, outcome review
ExpertFull trustWork autonomouslyOnly on anomalies

Step 1: Start every new task type at “Intern” level. Even if the tool is generally capable.

Step 2: Observe quality over 5–10 runs. Note: Where does it get it right? Where doesn’t it?

Step 3: If quality is consistent, level up. If not, stay at the current level or adjust your prompt.

Step 4: For every new task type: Back to “Intern.” Trust is task-specific, not blanket.

Whether you need to verify an AI result depends on two questions:

VerifiabilityExampleEffort to Check
Easy to checkFormatting, summaries, data extractionSeconds
Checkable with effortFactual claims, calculations, source citationsMinutes
Hard to checkStrategic recommendations, causal claims, forecastsRequires your own expertise
ReversibilityExampleRisk
Easy to undoInternal draft, notes, brainstormingLow
Costly to undoSent email, published reportMedium
IrreversibleContractual commitment, financial transaction, terminationHigh
Easy to VerifyHard to Verify
ReversibleDelegate, spot check is enoughDelegate, but review
IrreversibleDelegate, verify completelyDon’t delegate — do it yourself
  • Output is consistent across multiple requests
  • Claims are supported with sources
  • AI flags uncertainty (“I’m not confident here, but…”)
  • Format and structure match the brief
  • Fact-checking the first 3 claims confirms accuracy
  • Overly confident language on complex topics
  • Specific numbers without source attribution
  • Output that fits “too perfectly” — sounds good but lacks substance
  • Contradictions within the same output
  • Claims you can’t confirm with a quick search

Klarna’s AI assistant handled the equivalent work of 700 full-time agents, automating two-thirds of all customer service chats. Resolution time dropped from 11 minutes to under 2 minutes. But quality dropped — generic responses, rising complaints. The CEO publicly reversed course and resumed hiring human agents.

Lesson: Efficiency metrics can mask quality deterioration. Measure both.

Multiple lawyers submitted legal briefs with AI-generated citations — cases and quotes that didn’t exist. ChatGPT had fabricated them, and the lawyers hadn’t checked.

Lesson: AI can generate fact-like content that’s entirely invented. For factual claims: always verify.

A chatbot gave a customer wrong refund information. Air Canada argued the chatbot was “a separate legal entity.” The tribunal: No — the company is liable for all information its AI tools provide.

Lesson: You’re responsible for what AI communicates on your behalf.

Do
  • Start every new task type at 'Intern' level and systematically level up
  • Before delegating, ask: Is the result verifiable? Is a mistake reversible?
  • Spot-check factual claims — verify the first 3 points
  • Measure both efficiency AND quality, not just one
  • When uncertain: use AI for a draft, not a final product
Don't
  • Accept AI results unchecked because they sound professional
  • Build blanket trust because it worked well for one task type
  • Treat all AI results the same — trust level depends on the task
  • Use numbers, quotes, or factual claims without cross-checking
  • Transfer responsibility to AI — your name is on the result

Keep a trust log for one week: For every AI use, note the task, trust level (Intern to Expert), whether you checked, and whether the check found anything problematic. End of week: spot the patterns.

Take 5 tasks you regularly delegate to AI. Plot each on the two axes: How easy to verify? How reversible if wrong? Does your current checking behavior match the matrix?

Give the AI a task where you already know the right answer. Check: Where are green flags? Where are red flags? How confident does the AI sound — and is the result actually correct?

Trust calibration isn’t a one-time decision — it’s an ongoing practice. Like with a human colleague, you build trust over time, task by task, based on evidence. The best AI users aren’t those who trust the most or the least — but those who calibrate most precisely.

In the next lesson, you’ll learn about the legal framework: Compliance Basics — what the EU AI Act means for you as a knowledge worker and why “AI told me” isn’t an excuse.

Part of AI Learning — free courses from prompt to production. Jan on LinkedIn