Foundation Models
Context
Section titled “Context”Your CTO says: “We’re going with GPT-5.2.” Your ML engineer pushes back: “Llama 4 is good enough and costs a tenth.” A consultant recommends training a custom model. Everyone has solid arguments.
You’re the Product Manager. You don’t need to know which architecture is better. But you need to understand what foundation models are, what options exist, and how to make the right call for your product.
Concept
Section titled “Concept”Foundation models are “models trained on broad data at scale and adaptable to a wide range of downstream tasks” (Stanford HAI, 2021). Before foundation models, you needed a separate model for every task. Now you take one base model and adapt it.
Why they changed everything
Section titled “Why they changed everything”Emergent capabilities: Beyond a certain scale, foundation models develop abilities they were never explicitly trained for. A language model suddenly writes code or reasons through logic problems. Caveat: recent research (arXiv:2503.05788, 2025) suggests some of these “sudden” abilities may be measurement artifacts rather than true phase transitions.
Homogenization: The same model powers many different applications. That’s efficient — but also a concentration risk. If the base model has a systematic bias, every downstream app inherits it.
Democratization: Building an AI product used to require an ML team and months of training. Today: write a prompt, add RAG, optionally fine-tune. The barrier to entry for AI products has dropped dramatically.
The current landscape (March 2026)
Section titled “The current landscape (March 2026)”Closed-source: OpenAI (GPT-5.2, o3), Anthropic (Claude Opus 4.6, Sonnet 4.6), Google (Gemini 3 Pro/Flash). Highest quality, easiest integration, but vendor lock-in.
Open-source: Meta (Llama 4), DeepSeek (V3.2), Alibaba (Qwen 3), Mistral. Full control, no dependency, but you need your own infrastructure. DeepSeek proved that open-source can match frontier quality — with an MIT license and IMO- and IOI-benchmark-level performance.
All models are increasingly multimodal: text, image, audio, and video in a single model.
Adaptation techniques — cheapest to deepest
Section titled “Adaptation techniques — cheapest to deepest”| Technique | What happens | Cost | When to use |
|---|---|---|---|
| Prompt engineering | Zero-shot, few-shot, chain-of-thought | Cheapest | Quick start, prototypes |
| RAG | Retrieve relevant data + inject into prompt | Medium | Current data, fewer hallucinations |
| Fine-tuning | Adapt model weights for a domain | High | Specialized domain, consistent style |
| Train from scratch | Build your own model | Very high | Unique data, strategic advantage |
The rule of thumb: go as deep as necessary, stay as shallow as possible.
Framework
Section titled “Framework”Foundation Model Selection — three steps to the right model choice:
Step 1 — Define requirements: task type, latency, volume, data sensitivity, accuracy, budget.
Step 2 — Map to model category:
| Requirement | Approach |
|---|---|
| Fast, cheap, good enough | Open-source or Gemini Flash |
| Maximum quality | Claude Opus, GPT-5.2, Gemini Pro |
| Data must stay local | Open-source, self-hosted |
| Multimodal | Gemini (native), GPT-5 (vision), Claude (vision) |
| Long context | Gemini (1M+), Llama 4 Scout (10M), Claude (200K) |
Step 3 — Validate with evals. Never choose based on benchmarks alone. Test with your real data, your real use cases.
Build vs Buy vs Blend:
| Strategy | When | Example |
|---|---|---|
| Buy | Commoditized use case, speed, compliance | Duolingo: GPT-4 API for language exercises |
| Build | Competitive advantage, sensitive data | Harvey: custom legal LLM on proprietary case data |
| Blend | The default in 2026 | Shopify Sidekick: API + proprietary commerce data |
The dominant pattern in 2026: Blend. Buy the platform, build the last mile. Hybrid routing — 80% cheap open-source, 20% frontier closed for the hard cases.
Scenario
Section titled “Scenario”You’re building an internal knowledge assistant for a company with 2,000 employees. It should search internal documents and answer questions. The documents contain confidential HR and financial data.
The options:
| Option | Setup cost | Running cost/month | Data privacy | Time-to-market |
|---|---|---|---|---|
| A: Claude API + RAG | $15,000 | $3,200 | Data leaves the company | 6 weeks |
| B: Llama 4 self-hosted + RAG | $80,000 | $1,800 | Full control | 14 weeks |
| C: Blend — Llama 4 for standard, Claude for complex | $55,000 | $2,100 | Only non-sensitive data external | 10 weeks |
Closed-source models cost roughly 87% more to run than open-source alternatives (MIT Sloan), yet they’re chosen 80% of the time — because of easier integration and faster time-to-market.
Decide
Section titled “Decide”How would you decide?
The best decision in this scenario: Option C — Blend with hybrid routing.
Why:
- Confidential HR and financial data stays on your own Llama 4 server. Only non-sensitive queries go to the Claude API for higher quality on complex reasoning tasks.
- Running costs are 34% lower than the pure API solution.
- 10 weeks time-to-market is a fair compromise — not as fast as pure API, but with significantly better data privacy.
- You avoid full vendor lock-in and can replace the closed-source component at any time.
What many get wrong: Either going all-in on an API (“fast and simple”) without considering the data privacy implications, or insisting on self-hosting everything and failing under the complexity.
Reflect
Section titled “Reflect”- Foundation models are the platform layer of AI. Treat them as an infrastructure decision, not a feature decision — they determine what you can build and what you can’t.
- Open vs closed is not either/or. The dominant pattern in 2026 is hybrid routing: cheap open-source for the bulk, frontier closed for the hard cases.
- The adaptation technique matters more than the model. RAG with a good model beats fine-tuning with the best model — when the use case fits.
- Evals beat benchmarks. No benchmark replaces testing with your real data and your real users.
Sources: Bommasani et al. “On the Opportunities and Risks of Foundation Models” (Stanford HAI, 2021), MIT Sloan “Open vs Closed AI Models” (2025), arXiv:2503.05788 (2025), DeepSeek Technical Reports (2025-2026)