Skip to content
EN DE

Privacy

A Samsung engineer pastes confidential source code into ChatGPT to debug an issue. Within hours, the case is escalated internally. Within days, Samsung bans external AI tools for all employees.

Months later, Italy blocks ChatGPT over GDPR concerns about training data collection. Microsoft’s Recall feature — capturing screenshots every few seconds — is torn apart by security researchers: the data was stored in plaintext.

According to the Stanford AI Index Report 2025, AI-related privacy and security incidents jumped 56.4% in one year to 233 documented cases. This isn’t a trend — it’s a wave.

AI products have privacy implications that traditional software does not:

1. Training data privacy — What data was used for training? Was it collected with consent? Can the model reproduce training data? This is the model provider’s responsibility, but as a PM you inherit the risk.

2. Inference data privacy — What happens to user inputs and AI outputs? Are conversations logged? Who has access? Are they used for further training? This is where most PM decisions live.

3. Emergent privacy risks — AI can infer sensitive information from non-sensitive inputs. A model can deduce health conditions from shopping patterns or identify individuals from “anonymous” data. This is the hardest category.

RequirementMeaning for AIPM implication
Lawful basisAI training needs consent or legitimate interestClarify your legal basis early
Right to explanationArt. 22: No purely automated decisions with significant impactHuman-in-the-loop for decisions affecting people
Right to erasureUsers can request data deletion — but models don’t “forget”Technical solution for machine unlearning needed
Data minimizationCollect only what’s necessaryAI systems that hoover up all data violate this principle
DPIAData Protection Impact Assessment for high-risk processingMost AI applications qualify
TechniqueHow it worksTrade-offMaturity
Differential PrivacyControlled statistical noise; protects individual contributions2-10% accuracy lossProduction-ready (Apple, Google)
Federated LearningTraining on decentralized data; only model updates leave the deviceSlower convergenceProduction-ready for some use cases
Secure Multi-Party ComputationMultiple parties compute jointly without revealing their dataComputationally expensiveEarly production
Homomorphic EncryptionComputation on encrypted data without decryptionExtreme overhead (1000x+ slower)Research stage for ML

The Privacy Tier Decision Guide — choose the right privacy level for your product:

TierArchitectureWhen to useExample
Tier 1On-device processingHighest sensitivity; health, financeApple Intelligence
Tier 2Private cloud with cryptographic verificationHigh sensitivity; enterpriseApple Private Cloud Compute
Tier 3API with data isolationMedium sensitivity; business dataChatGPT Enterprise, Claude Teams
Tier 4Shared APILow sensitivity; public dataStandard API access

Rule of thumb: Start with the highest tier your use case requires, and only go lower when technical or economic reasons are compelling.

You’re a PM at a health tech startup. Your product: an AI assistant that helps doctors dictate and automatically structure clinical letters. The feature recognizes patient data in dictation, structures it, and enters it into the medical record.

The facts:

  • 200 doctors in the beta, 500 dictations per day
  • Each dictation contains patient names, diagnoses, medications — all protected health information (PHI)
  • Current setup: audio is sent to a cloud API, transcribed, then sent to an LLM for structuring
  • The API provider retains data for 30 days “for quality assurance”
  • A hospital customer requires a DPIA before rollout
  • Your CTO says: “On-device is too slow and quality isn’t good enough”

The cost of a private cloud solution (Tier 2) would be roughly 3x the current cloud API (Tier 4).

How would you decide?

The best decision: Migrate to Tier 2 or at minimum Tier 3. Specifically: choose an API provider with a zero-retention policy or build a private cloud solution. Conduct the DPIA — it’s mandatory for health data anyway.

Why:

  • Sending PHI through a shared API that retains data for 30 days is a GDPR violation waiting to happen
  • Health data is the most sensitive data category — Tier 4 is unacceptable here
  • The 3x higher cost is irrelevant compared to a GDPR fine (up to EUR 20M or 4% of turnover) or losing hospital customers
  • The DPIA isn’t a blocker — it’s an opportunity to build the system properly before it scales

What many get wrong: Keeping the cheap cloud API and hoping nobody asks where patient data ends up — until a data protection incident blows everything wide open.

Privacy is not a compliance issue — it’s a product issue. Users make adoption decisions based on privacy trust, and enterprise buyers require data isolation as a purchase prerequisite.

  • Anonymization alone is not enough — AI models can re-identify individuals from “anonymous” data
  • “We just use the API” doesn’t mean no data leaves your system — read the provider’s retention policies
  • The Samsung incident changed enterprise AI policies overnight; privacy is a latent concern that becomes acute during incidents

Sources: Stanford AI Index Report 2025, GDPR Local — AI Privacy Risks, TensorBlue — AI Data Privacy 2025, Frontiers — Federated Learning (2025), Frontiers — AI Privacy Review (2026)

Part of AI Learning — free courses from prompt to production. Jan on LinkedIn