Home | Evolving Agents

Why do some AI agent systems get better over time — while others just get more complex?

Martin Nowak answered this question for biology 20 years ago. The math already exists. We explored how it maps to AI agents.

Warum werden manche KI-Agent-Systeme mit der Zeit besser — während andere nur komplexer werden?

Martin Nowak hat diese Frage für die Biologie vor 20 Jahren beantwortet. Die Mathematik existiert bereits. Wir untersuchen, wie sie sich auf KI-Agents übertragen lässt.

Try It: The Phase Transition

Nowak's Originator Equation describes when a system transitions from random diversity (Prelife) to directed evolution (Life).

Ausprobieren: Der Phasenübergang

Nowaks Originator-Gleichung beschreibt, wann ein System von zufälliger Diversität (Prelife) zu gerichteter Evolution (Life) übergeht.

Drag the slider to see the phase transition →

Ziehe den Regler, um den Phasenübergang zu sehen →

Replication rateReplikationsrate r 0.00

◀ Prelife — no replication, random diversity

ℹ How to read this chartℹ So liest du die Grafik

80 bars = 80 molecular sequences competing in a chemical soup
Bar height = frequency (share of the population)
Bar color: ■ amber = Prelife dominates · ■ green = Selection dominates · brighter = fitter
Slider r = replication rate — how much template-directed copying happens

80 Balken = 80 molekulare Sequenzen, die in einer chemischen Suppe konkurrieren
Balkenhöhe = Häufigkeit (Anteil an der Population)
Balkenfarbe: ■ amber = Prelife dominiert · ■ grün = Selektion dominiert · heller = fitter
Regler r = Replikationsrate — wie viel template-gesteuerte Kopierung stattfindet

ẋᵢ = aᵢ · xᵢ' − (d + aᵢ₀ + aᵢ₁) · xᵢ + r · xᵢ · (fᵢ − φ)

PrelifePrelife

0.00

chemistry createsChemie erzeugt

DecayZerfall

0.00

champion degradesChampion zerfällt

SelectionSelektion

0.00

champion replicatesChampion repliziert

Full mathematical synthesis →

Vollständige mathematische Synthese →

🔬 Why Current Agent Systems Are in the Prelife Phase

Following Nowak’s terminology, most agent systems today — including AgentField — are in the “Prelife” phase:

✅ Diversity exists (~30 skills)
✅ Selection exists (Quality Gate)
❌ Replication is missing — no automatic inheritance of successful patterns

This is not a problem. Prelife is already generative. But the step to “Life” mode is missing: the closed evolutionary loop.

IS:     Task → Skill Selection → Execution → [Quality Gate] → Output
                                                    ↓
                                             (Feedback dies here)

SHOULD: Task → Skill Selection → Execution → [Quality Gate] → Output
                  ↑                                ↓
                  └──── Skill Mutation ←── Evaluation + Cost ──┘

What would close the loop:

Cost tracking as a second optimization axis
Skill performance history — which skill → which task → which score
Automated prompt mutation — small variations, A/B testing
Niching — multiple skill variants per task type, preserving diversity

EvoFlow shows empirically: The step from Phase 0 (manual workflows) to Phase 3 (evolved population) yields 1.23–29.86% improvement at 12.4% of the cost. This is not a hypothetical gain.

Identified: 2026-03-19, during EvoFlow/MCE/AgentFactory Deep Dive

🔬 Warum aktuelle Agent-Systeme in der Prelife-Phase sind

Nach Nowaks Terminologie befinden sich die meisten Agent-Systeme heute — einschließlich AgentField — in der “Prelife”-Phase:

✅ Diversität vorhanden (~30 Skills)
✅ Selektion vorhanden (Quality-Gate)
❌ Replikation fehlt — keine automatische Vererbung erfolgreicher Muster

Das ist nicht schlimm. Prelife ist bereits generativ. Aber es fehlt der Schritt zum “Life”-Modus: der geschlossene evolutionäre Loop.

IST:   Task → Skill-Auswahl → Execution → [Quality-Gate] → Output
                                                  ↓
                                           (Feedback stirbt hier)

SOLL:  Task → Skill-Auswahl → Execution → [Quality-Gate] → Output
                 ↑                               ↓
                 └──── Skill-Mutation ←── Evaluation + Cost ──┘

Was den Loop schließen würde:

Cost-Tracking als zweite Optimierungsachse
Skill-Performance-Historie — welcher Skill → welcher Task → welcher Score
Automatisierte Prompt-Mutation — kleine Variationen, A/B-Test
Niching — mehrere Skill-Varianten pro Task-Typ, Diversität erhalten

EvoFlow zeigt empirisch: Der Schritt von Phase 0 (manuelle Workflows) zu Phase 3 (evolvierte Population) bringt 1,23–29,86% Verbesserung bei 12,4% der Kosten. Das ist kein hypothetischer Gewinn.

Erkannt: 2026-03-19, während EvoFlow/MCE/AgentFactory Deep Dive

The Bridge: Biology → AI Agents

The same dynamics that govern biological evolution map structurally onto AI agent systems:

Die Brücke: Biologie → KI-Agents

Dieselben Dynamiken, die biologische Evolution bestimmen, lassen sich strukturell auf KI-Agent-Systeme übertragen:

Biology (Nowak)Biologie (Nowak)

Sequence / Replicator

DNA that gets copiedDNA die kopiert wird

Fitness fᵢ

Reproductive successReproduktionserfolg

Mutation

Random copying errorsZufällige Kopierfehler

Selection φ

Survive or dieÜberleben oder sterben

Error Threshold

Too many mutations → meltdownZu viele Mutationen → Meltdown

↔

AI Agent SystemKI-Agent-System

Agent ConfigAgent-Konfiguration

Prompt + tools + memoryPrompt + Tools + Memory

Performance Metric

Quality score + token costQuality Score + Token-Kosten

Prompt VariationPrompt-Variation

A/B testing, TextGradA/B-Testing, TextGrad

Quality Gate

Keep or discardBehalten oder verwerfen

Context Limit

Too many changes → quality collapseZu viele Änderungen → Qualitätskollaps

This is a structural analogy, not a formal proof. Agent evolution is Lamarckian, not Darwinian. We document 9 counter-arguments — 3 rated STRONG.

Dies ist eine strukturelle Analogie, kein formaler Beweis. Agent-Evolution ist lamarckisch, nicht darwinistisch. Wir dokumentieren 9 Gegenargumente — 3 als STARK bewertet.

The Upgrade Path

Where is your agent system on the evolutionary ladder?

Der Upgrade-Pfad

Wo steht dein Agent-System auf der evolutionären Leiter?

Prelife ← Most systems are here

Manual curation. You write prompts, pick tools, evaluate by hand. There's diversity, but no heredity — good patterns don't automatically persist.

Feedback Loop ← Spec ready

Measure quality + cost per skill execution. SQL schema, Pareto views, alert triggers. You can see what works — but changes are still manual.

Mutation

A/B test prompt variations automatically. Keep winners, discard losers. The system starts improving itself.

Population

Maintain diverse skill variants via niching. Quality-Diversity optimization instead of single-best-solution thinking.

Full Evolution

The phase transition: autonomous improvement faster than manual curation. EvoFlow shows this is already possible — at 12.4% the cost of o1-preview.

Prelife ← Die meisten Systeme sind hier

Manuelle Kuration. Du schreibst Prompts, wählst Tools, evaluierst von Hand. Es gibt Diversität, aber keine Vererbung — gute Muster persistieren nicht automatisch.

Feedback-Loop ← Spec bereit

Qualität + Kosten pro Skill-Ausführung messen. SQL-Schema, Pareto-Views, Alert-Trigger. Du siehst was funktioniert — aber Änderungen sind noch manuell.

Mutation

Prompt-Variationen automatisch A/B-testen. Gewinner behalten, Verlierer verwerfen. Das System beginnt, sich selbst zu verbessern.

Population

Diverse Skill-Varianten via Niching pflegen. Quality-Diversity-Optimierung statt Single-Best-Solution-Denken.

Volle Evolution

Der Phasenübergang: Autonome Verbesserung schneller als manuelle Kuration. EvoFlow zeigt, dass das bereits möglich ist — bei 12,4% der Kosten von o1-preview.

60+ Papers

7 PrinciplesPrinzipien

9 Counter-ArgumentsGegenargumente

Dive Deeper

📐 Nowak Synthesis

The Originator equation, phase transitions, error threshold — the full mathematical bridge.

Read →

📄 60+ Papers

9 categories, 19 must-reads. Prioritized and linked to arXiv.

Browse →

🔬 EvoFlow Deep Dive

How 3 papers bridge evolutionary theory to agent practice.

Read →

🧭 7 Principles

Actionable design rules from evolutionary theory.

Explore →

⚔️ Counter-Arguments

9 critiques. 3 rated STRONG. Where the thesis breaks.

Read →

🔧 Phase 1 Spec

SQL schema, Pareto views. Ready to implement today.

See spec →

Tiefer eintauchen

📐 Nowak-Synthese

Die Originator-Gleichung, Phasenübergänge, Error Threshold — die vollständige mathematische Brücke.

Lesen →

📄 60+ Papers

9 Kategorien, 19 Must-Reads. Priorisiert und mit arXiv verlinkt.

Durchsuchen →

🔬 EvoFlow Deep Dive

Wie 3 Papers Evolutionstheorie mit Agent-Praxis verbinden.

Lesen →

🧭 7 Prinzipien

Handlungsorientierte Design-Regeln aus der Evolutionstheorie.

Erkunden →

⚔️ Gegenargumente

9 Kritiken. 3 als STARK bewertet. Wo die These bricht.

Lesen →

🔧 Phase-1-Spec

SQL-Schema, Pareto-Views. Heute implementierbar.

Spec ansehen →

FAQ

Is this a formal proof?

No. It's a structural analogy — a design heuristic, not a mathematical proof. See 9 counter-arguments.

Were these papers actually read?

Core papers read in full text. Broader corpus based on abstracts and summaries, cross-checked against 2+ sources. See Limitations.

What can I DO with this?

Use the 7 principles as a design checklist. Implement the Phase 1 feedback loop. Read the 60+ papers.

FAQ

Ist das ein formaler Beweis?

Nein. Es ist eine strukturelle Analogie — eine Design-Heuristik, kein mathematischer Beweis. Siehe 9 Gegenargumente.

Wurden die Papers gelesen?

Kern-Papers im Volltext gelesen. Breiterer Corpus basierend auf Abstracts und Zusammenfassungen, gegen 2+ Quellen geprüft. Siehe Limitationen.

Was kann ich damit machen?

Die 7 Prinzipien als Design-Checkliste nutzen. Den Phase-1-Feedback-Loop implementieren. Die 60+ Papers lesen.

This is a research synthesis, not a systematic review. Analysis is based on abstracts, summaries, and cross-verified data from 60+ papers. Counter-arguments documented. Full limitations →

Dies ist eine Forschungssynthese, kein systematisches Review. Die Analyse basiert auf Abstracts, Zusammenfassungen und kreuzverifizierten Daten aus 60+ Papers. Gegenargumente dokumentiert. Vollständige Limitationen →