Skip to content
EN DE

Build vs. Buy

Your CTO wants to fine-tune a custom LLM. Your CEO heard that Cursor is worth billions with an “API wrapper.” Your ML lead says RAG is “the future.” Everyone is talking about different things, and everyone is somehow right.

You’re the Product Manager. You don’t need to decide how to build — but where on the spectrum to start. And that decision shapes budget, timeline, and vendor dependency for years to come.

Build vs. Buy is not a binary choice. It’s a spectrum with five levels — and most teams start too high.

LevelApproachCostTime-to-ValueControl
1Prompt Engineering$0–100/monthHours/DaysLow
2API Integration$100–10K/monthDays/WeeksMedium
3RAG (Retrieval-Augmented Generation)$70–1K/month infraWeeksMedium-High
4Fine-Tuning$600–50K+Weeks/MonthsHigh
5Training from Scratch$100K–$100M+Months/YearsMaximum

The golden rule: Start at Level 1 and only escalate when the current level demonstrably falls short. Prompt Engineering → RAG → Fine-Tuning → Custom Model.

Almost every AI product team faces this decision at some point:

CriterionRAGFine-Tuning
Real-time dataYes (live updates)No (snapshot at training time)
TCO10–50x cheaperHigh (training + hosting)
Source transparencyYes (citable sources)No (implicit knowledge)
Output consistencyVariableHigh (learned format)
Domain specializationMediumHigh
Offline deploymentNoYes
Team skills neededData EngineeringML Engineering

Vendor Lock-in — the underestimated risk

Section titled “Vendor Lock-in — the underestimated risk”

67% of organizations want to avoid dependency on a single AI provider. Yet most teams build directly on one API — without an abstraction layer.

The consequences: 57% of IT leaders spent over $1M on platform migrations last year. When OpenAI had a global outage in June 2025, Zendesk features were down for hours.

Protection strategies:

  • AI Gateway as an abstraction layer (Gartner: by 2028, 70% of multi-LLM apps will use a gateway, vs. under 5% in 2024)
  • Multi-provider strategy for critical features
  • Modular architecture with swappable LLM modules
  • Contractual exit clauses and data portability

The Escalation Matrix — start at the bottom, move up only when necessary:

QuestionYes →No →
Does a well-crafted prompt suffice?Stay at Level 1Continue
Does it need proprietary data in context?RAG (Level 3)API suffices (Level 2)
Must the output format be exactly consistent?Evaluate Fine-Tuning (Level 4)RAG suffices
Are there regulatory reasons requiring full control?Evaluate Custom Model (Level 5)Fine-Tuning suffices

Before every level-up, ask: Have I truly exhausted the current level — or am I hoping that more complexity will solve my actual problem?

You’re a PM at a B2B SaaS for contract management. Feature request: AI should summarize contracts and flag risk clauses. 2,000 customers, 50,000 contracts/month.

Three options on the table:

  • Option A — API (Level 2): Claude API + prompting. $3K/month API costs. Live in 2 weeks.
  • Option B — RAG (Level 3): API + proprietary contract database as context. $5K/month total. Live in 6 weeks.
  • Option C — Fine-Tuning (Level 4): Custom model on 100,000 annotated contracts. $80K setup + $8K/month. Live in 4 months.

Additional context:

  • 42% of companies scrapped AI initiatives in 2024 — most common reason: building too complex too soon
  • Klarna automated 2/3 of chats via API, achieved $40M in profit — then had to reverse course (“focused too much on efficiency”) and shifted to a hybrid model
  • GitHub Copilot (API-based): 15M users, 55% faster tasks, but 41% higher code churn rate
How would you decide?

The best decision: Start with Option B (RAG) — but build Option A first to learn quickly.

Why:

  • Option A first: live in 2 weeks, real user feedback. Prompt engineering on contracts is surprisingly capable
  • Then Option B: your own contract database as RAG context significantly improves detection of customer-specific clause patterns
  • Option C is premature: without 6+ months of usage data, you don’t know what to fine-tune for. The 42% of failed AI initiatives made exactly this mistake
  • Build an abstraction layer — if Claude goes down tomorrow or gets more expensive, you need to be able to switch providers

What many get wrong: Planning fine-tuning as step one before exhausting prompt engineering and RAG. Or baking variable LLM costs into a fixed pricing model without modeling usage volume scenarios.

  • Build vs. Buy is a spectrum, not a binary choice. The five levels require radically different budgets, timelines, and team skills. Start low, escalate with data.
  • RAG before Fine-Tuning. RAG is 10–50x cheaper, provides transparent sources, and doesn’t require an ML team. Fine-tuning only pays off when RAG demonstrably falls short.
  • Vendor lock-in is a product risk, not an infrastructure detail. An AI Gateway and multi-provider capability belong in the architecture from day one — not as a “later optimization.”
  • Cursor shows the sweet spot: API wrapper + proprietary UX innovation = multi-billion valuation. You don’t need to build your own model to create massive value.

Sources: Gartner “Innovation Insight: AI Gateways” (2024), Klarna “AI Assistant Report” (2024/2025), GitHub Blog “Research: Quantifying Copilot’s Impact” (2024), Cursor Financials (2025), OpenAI Outage Report (June 2025), IDC “AI Platform Migration Survey” (2024)

Part of AI Learning — free courses from prompt to production. Jan on LinkedIn