How LLMs think
Context
Section titled “Context”Your CEO comes back from a meeting and says: “We need to put AI in our product.” Your engineering team talks about tokens, temperature, and hallucinations. Your designer asks whether the chatbot needs a personality.
You’re the Product Manager. You don’t need to know how to train an LLM. But you need to understand what it does — well enough to ask the right questions and spot the wrong promises.
Concept
Section titled “Concept”A Large Language Model is a probability machine for text. Given an input (prompt), it calculates the most likely continuation — token by token.
What a token is
Section titled “What a token is”A token is not a word. It’s a chunk of text that the model processes as a unit. “Product management” might be 2-3 tokens. “AI” is one. Tokenization determines how the model “sees” language.
Why this matters for you:
- Tokens determine cost (you pay per token)
- Tokens determine limits (context window = max tokens per request)
- Tokens determine speed (more tokens = longer response time)
How an LLM “responds”
Section titled “How an LLM “responds””The model sees your prompt and calculates: which token is most likely next? Then it takes that token, appends it, and calculates the next one. And so on.
This means:
- The model doesn’t plan. It generates sequentially.
- The model doesn’t know things. It has learned statistical patterns.
- The model doesn’t decide. It samples from probability distributions (at temperature 0, it deterministically picks the most likely token).
Temperature
Section titled “Temperature”Temperature controls how “creative” the model responds:
| Temperature | Behavior | Use case |
|---|---|---|
| 0.0 | Always the most likely token | Fact extraction, classification |
| 0.3–0.7 | Slight variation | Most product applications |
| 1.0+ | High randomness | Creative writing, brainstorming |
Hallucinations
Section titled “Hallucinations”When an LLM “invents” something, it’s not a malfunction — it’s the model doing what it always does: generating the most likely continuation. Sometimes the most likely continuation is factually wrong.
“Hallucinations are not a bug. They are an inherent property of systems based on probability — reducible, but not fully eliminable.”
Framework
Section titled “Framework”The Token-Cost-Quality Triangle — three variables you need to balance with every AI product decision:
| Variable | Lever | Tradeoff |
|---|---|---|
| Quality | Larger model, more context | Higher cost, slower response |
| Cost | Smaller model, fewer tokens | Lower quality |
| Speed | Streaming, smaller model | Potentially lower quality |
You can’t maximize all three simultaneously. Your job as a PM: decide which variable matters most for your product.
Scenario
Section titled “Scenario”You’re building a customer support tool. The bot should answer common questions — returns, delivery status, product info. Your engineering team suggests GPT-4. Your CFO wants to keep costs low.
The situation:
- 50,000 customer inquiries per month
- Average 200 tokens input + 300 tokens output per inquiry
- GPT-4o: $2.50/1M input tokens, $10/1M output tokens
- GPT-4o-mini: $0.15/1M input tokens, $0.60/1M output tokens
GPT-4o calculation: ~$175/month GPT-4o-mini calculation: ~$10/month
The quality difference for structured support responses? Often minimal — because the context (FAQ database) does the heavy lifting, not the model.
Decide
Section titled “Decide”How would you decide?
The best decision in this scenario: Start with GPT-4o-mini + a solid FAQ database (RAG). Measure response quality. Only escalate cases to a larger model where quality isn’t sufficient.
Why:
- 55x cheaper with often comparable quality for structured responses
- You can upgrade later — downgrading is harder
- The FAQ database (context) has more impact on quality than model size
- Routing (simple questions → small model, complex → large) is an established pattern
What many get wrong: Choosing the biggest model “to be safe” and then being surprised when costs explode at scale.
Reflect
Section titled “Reflect”The key insight from this lesson: LLMs are probability machines, not knowledge machines. This distinction determines how you design AI features:
- Don’t expect perfect accuracy — design for inaccuracy
- Give the model good context instead of hoping for a better model
- Think in tokens, not in words
Sources: Anthropic Documentation (2025), OpenAI Pricing (2025), Chip Huyen “Designing Machine Learning Systems” (O’Reilly, 2022)