Counter-Arguments: Where the Nowak-Agent Analogy Breaks

Reading: 5 of 5 · After reading this, you’ll know the 9 strongest objections to the evolutionary analogy — including 3 rated STRONG — and where the thesis survives vs. where it needs correction.

Date: 2026-03-19 Purpose: Honesty test — targeted search for counter-arguments to our own thesis. This document collects the strongest objections.

Without this document, the repo would be confirmation-biased. A thesis becomes stronger when it knows its counter-arguments — or it corrects itself.

Section Summary

9 counter-arguments tested against the Nowak-Agent analogy:

🔴 3 STRONG — Lamarckian dynamics, no true reproduction, artificial selection
🟡 3 MEDIUM — Wishful mnemonics, open-ended evolution, cost
🟢 1 LOW — Bad outcomes
🟡 1 META — Evo-Devo
🔴 1 MEDIUM-STRONG — Population size

Verdict: The structural analogy survives as a design heuristic but requires explicit corrections — especially a new principle: "The fitness function is the most fragile part."

📑 On this page

🔴 G1: Lamarckian, not Darwinian — STRONG
🔴 G2: No True Reproduction — STRONG
🟡 G3: Wishful Mnemonics — MEDIUM
🔴 G4: Artificial ≠ Natural Selection — STRONG
🟡 G5: Open-Ended Evolution — MEDIUM
🟡 G6: Expensive — MEDIUM
🟢 G7: Bad Outcomes — LOW
🟡 G8: Evo-Devo — META
🔴 G9: Population Size — MEDIUM-STRONG
Summary

Gegenargumente: Wo die Nowak-Agent-Analogie bricht

Lesehinweis: 5 von 5 · Nach der Lektüre kennst du die 9 stärksten Einwände gegen die Evolutions-Analogie — darunter 3 als STARK bewertet — und wo die These standhält vs. wo sie korrigiert werden muss.

Datum: 2026-03-19 Zweck: Ehrlichkeitstest — gezielte Suche nach Gegenargumenten zur eigenen These. Dieses Dokument sammelt die stärksten Einwände.

Ohne dieses Dokument wäre das Repo confirmation-biased. Eine These wird stärker, wenn sie ihre Gegenargumente kennt — oder sie korrigiert sich.

Abschnitt-Zusammenfassung

9 Gegenargumente gegen die Nowak-Agent-Analogie geprüft:

🔴 3 STARK — Lamarcksche Dynamik, keine echte Reproduktion, künstliche Selektion
🟡 3 MITTEL — Wishful Mnemonics, Open-Ended Evolution, Kosten
🟢 1 NIEDRIG — Schlechte Outcomes
🟡 1 META — Evo-Devo
🔴 1 MITTEL-STARK — Populationsgröße

Fazit: Die strukturelle Analogie überlebt als Design-Heuristik, erfordert aber explizite Korrekturen — insbesondere ein neues Prinzip: „Die Fitness-Funktion ist der fragilste Teil."

📑 Auf dieser Seite

🔴 G1: Lamarckisch, nicht Darwinistisch — STARK
🔴 G2: Keine echte Reproduktion — STARK
🟡 G3: Wishful Mnemonics — MITTEL
🔴 G4: Künstliche ≠ Natürliche Selektion — STARK
🟡 G5: Open-Ended Evolution — MITTEL
🟡 G6: Teuer — MITTEL
🟢 G7: Schlechte Outcomes — NIEDRIG
🟡 G8: Evo-Devo — META
🔴 G9: Populationsgröße — MITTEL-STARK
Zusammenfassung

The 9 Strongest Counter-ArgumentsDie 9 stärksten Gegenargumente

G1: Agent Systems Are Lamarckian, Not Darwinian (STRONG)G1: Agent-Systeme sind Lamarckisch, nicht Darwinistisch (STARK)

⚔️ STRONG

Agent evolution uses directed optimization (Lamarckism), not random mutation (Darwinism). Nowak's results become a lower bound, not an exact mapping.

Objection: Nowak’s mathematics is based on random mutation + selection (Darwinism). Agent systems, however, use directed optimization — TextGrad, Reinforcement Learning, feedback-based finetuning. This is Lamarckism (acquired traits are inherited), not Darwinism.

Practical consequence: The Originator Equation assumes random mutations. If “mutation” in agents is directed (e.g., TextGrad optimizes prompts along the gradient), then Nowak’s results on Error Threshold and phase transitions don’t apply directly — they could be conservative (directed mutation is more efficient) or irrelevant (different dynamics).

Assessment: Valid. Our analogy must explicitly state: “Agent evolution is more Lamarckian than Darwinian. Nowak’s results are a lower bound — directed mutation can perform better, but has different failure modes (e.g., local optima instead of Error Threshold).”

Source: General critique, summarized in AI & Society, 2025

⚔️ STARK

Agent-Evolution nutzt gerichtete Optimierung (Lamarckismus), nicht zufällige Mutation (Darwinismus). Nowaks Ergebnisse werden zur unteren Schranke, nicht zur exakten Abbildung.

Einwand: Nowaks Mathematik basiert auf zufälliger Mutation + Selektion (Darwinismus). Agent-Systeme nutzen aber gerichtete Optimierung — TextGrad, Reinforcement Learning, Feedback-basiertes Finetuning. Das ist Lamarckismus (erworbene Eigenschaften werden vererbt), nicht Darwinismus.

Praktische Konsequenz: Die Originator-Gleichung setzt zufällige Mutationen voraus. Wenn die “Mutation” bei Agents gerichtet ist (z.B. TextGrad optimiert Prompts in die Richtung des Gradienten), dann gelten Nowaks Ergebnisse über Error Threshold und Phasenübergang nicht direkt — sie könnten konservativ sein (gerichtete Mutation ist effizienter) oder irrelevant (andere Dynamik).

Bewertung: Trifft zu. Unsere Analogie muss explizit sagen: “Agent-Evolution ist eher Lamarckisch als Darwinistisch. Nowaks Ergebnisse sind eine untere Schranke — gerichtete Mutation kann besser sein, hat aber andere Failure Modes (z.B. lokale Optima statt Error Threshold).”

Quelle: Allgemeine Kritik, zusammengefasst in AI & Society, 2025

G2: No True Reproduction (STRONG)G2: Keine echte Reproduktion (STARK)

⚔️ STRONG

Agent configurations don't self-replicate. Without true reproduction, Nowak's replication term is insubstantial. Distinction between "designed" and "autonomous" evolution needed.

Objection: Biological evolution requires reproduction — individuals copy themselves, with variation. Agent configurations don’t copy themselves. They are created by humans or meta-agents. Without true reproduction, the replication term in Nowak’s equation (r · xᵢ · (fᵢ − φ)) is insubstantial.

Practical consequence: When we say “Skills replicate through SKILL.md inheritance,” that’s a weak analogy. A SKILL.md isn’t automatically copied and varied — it’s manually edited or rewritten by an LLM. That’s more “design” than “replication.”

Assessment: Partially valid. EvoFlow comes closer: Workflows are automatically generated from parent workflows through crossover/mutation — that IS a form of reproduction. But in a typical skill-based agent system it’s manual. We should clearly distinguish between “designed evolution” and “autonomous evolution.”

⚔️ STARK

Agent-Konfigurationen replizieren sich nicht selbst. Ohne echte Reproduktion ist Nowaks Replikationsterm substanzlos. Unterscheidung zwischen „designed" und „autonomous" Evolution nötig.

Einwand: Biologische Evolution braucht Reproduktion — Individuen kopieren sich, mit Variation. Agent-Konfigurationen kopieren sich nicht selbst. Sie werden von Menschen oder Meta-Agents erstellt. Ohne echte Reproduktion ist der Replikationsterm in Nowaks Gleichung (r · xᵢ · (fᵢ − φ)) substanzlos.

Praktische Konsequenz: Wenn wir sagen “Skills replizieren sich durch SKILL.md-Vererbung”, dann ist das eine schwache Analogie. Ein SKILL.md wird nicht automatisch kopiert und variiert — es wird manuell editiert oder von einem LLM umgeschrieben. Das ist eher “Design” als “Replikation”.

Bewertung: Trifft teilweise zu. EvoFlow kommt näher: Workflows werden automatisch aus Eltern-Workflows durch Crossover/Mutation erzeugt — das IST eine Form von Reproduktion. Aber in einem typischen skill-basierten Agent-System ist es manuell. Wir sollten klar zwischen “designed evolution” und “autonomous evolution” unterscheiden.

G3: “Wishful Mnemonics” — Metaphor Inflation (MEDIUM)G3: „Wishful Mnemonics” — Metaphern-Inflation (MITTEL)

⚠️ MEDIUM

Biological labels on technical processes may suggest depth that doesn't exist. Always use both terms — biological and technical — to stay honest.

Objection: Drew McDermott coined the term “wishful mnemonics” for AI’s tradition of giving systems human/biological terms (“Agent,” “Knowledge,” “Goal,” “Evolution”) hoping the metaphors would eventually become reality. The risk: The metaphor obscures what’s actually happening.

Practical consequence: When we say “Mutation” instead of “Prompt Variation” or “Fitness” instead of “Quality Score,” we suggest a depth of analogy that may not exist. The isomorphism table could be a sophisticated case of label-matching: We stick biological labels on technical processes and call it a “bridge.”

Assessment: Justified as a warning. Countermeasure: In the synthesis, always use BOTH terms — the biological and the technical. Not “Mutation,” but “Prompt Variation (analogy: Mutation).” We’ve already started with “Structural Analogy” instead of “Formal Isomorphism,” but should apply this more consistently.

Source: McDermott, D. (1981). “Artificial Intelligence Meets Natural Stupidity.” — Original source of the “wishful mnemonics” concept.

⚠️ MITTEL

Biologische Etiketten auf technischen Prozessen suggerieren eine Tiefe, die möglicherweise nicht existiert. Immer beide Begriffe nennen — biologisch und technisch — um ehrlich zu bleiben.

Einwand: Drew McDermott prägte den Begriff “wishful mnemonics” für die AI-Tradition, Systemen menschliche/biologische Begriffe zu geben (“Agent”, “Wissen”, “Ziel”, “Evolution”) in der Hoffnung, die Metaphern würden irgendwann Realität werden. Das Risiko: Die Metapher verstellt den Blick auf das, was tatsächlich passiert.

Praktische Konsequenz: Wenn wir “Mutation” sagen statt “Prompt-Variation” oder “Fitness” statt “Quality-Score”, suggerieren wir eine Tiefe der Analogie, die möglicherweise nicht existiert. Die Isomorphie-Tabelle könnte ein raffinierter Fall von Label-Matching sein: Wir kleben biologische Etiketten auf technische Prozesse und nennen es “Brücke”.

Bewertung: Berechtigt als Warnung. Gegenmaßnahme: In der Synthese immer BEIDE Begriffe nennen — den biologischen und den technischen. Nicht “Mutation”, sondern “Prompt-Variation (Analogie: Mutation)”. Wir haben mit “Strukturelle Analogie” statt “Formale Isomorphie” schon angefangen, aber konsequenter durchziehen.

Quelle: McDermott, D. (1981). “Artificial Intelligence Meets Natural Stupidity.” — Originalquelle des “wishful mnemonics”-Begriffs.

G4: Artificial Selection ≠ Natural Selection (STRONG)G4: Künstliche Selektion ≠ Natürliche Selektion (STARK)

⚔️ STRONG

Humans design the fitness function — making the quality of "evolution" only as good as the metric. The fitness function is the most fragile part of the system.

Objection: In biology, the environment IS the selector — there is no designer. In agent systems, humans design the fitness function (Quality Gate, benchmarks). This is artificial selection (like dog breeding), not natural selection.

Practical consequence: The quality of “evolution” is only as good as the fitness function. If the Quality Gate measures poorly, the system evolves in the wrong direction — at high speed. Natural selection doesn’t have this problem because the “fitness function” (survival) isn’t designed by someone fallible.

Assessment: Fully valid. This is actually a stronger argument FOR our work: It makes the Error Threshold even more important. Not just “don’t change more than you can measure” (Nowak), but also “make sure you’re measuring the right thing” (new principle?).

Possible new principle: “The fitness function is the most fragile part of the system.”

⚔️ STARK

Menschen designen die Fitness-Funktion — die Qualität der „Evolution" ist nur so gut wie die Metrik. Die Fitness-Funktion ist der fragilste Teil des Systems.

Einwand: In der Biologie IST die Umwelt der Selektor — es gibt keinen Designer. In Agent-Systemen designen Menschen die Fitness-Funktion (Quality-Gate, Benchmarks). Das ist künstliche Selektion (wie Hundezucht), nicht natürliche Selektion.

Praktische Konsequenz: Die Qualität der “Evolution” ist nur so gut wie die Fitness-Funktion. Wenn das Quality-Gate schlecht misst, evolviert das System in die falsche Richtung — mit hoher Geschwindigkeit. Natürliche Selektion hat dieses Problem nicht, weil die “Fitness-Funktion” (Überleben) nicht von einem Fehlbaren designt wird.

Bewertung: Trifft voll zu. Das ist sogar ein stärkeres Argument FÜR unsere Arbeit: Es macht den Error Threshold noch wichtiger. Nicht nur “ändere nicht mehr als du messen kannst” (Nowak), sondern auch “stelle sicher, dass du das Richtige misst” (neues Prinzip?).

Mögliches neues Prinzip: “Die Fitness-Funktion ist der fragilste Teil des Systems.”

G5: Open-Ended Evolution Is Biologically Unique (MEDIUM)G5: Open-Ended Evolution ist biologisch einzigartig (MITTEL)

⚠️ MEDIUM

Agent evolution is always "closed" — bound to pre-defined metrics. It never generates genuine novelty. Less relevant since we want directed improvement, not open-ended evolution.

Objection: Only biological systems produce open-ended evolution — the continuous generation of genuine novelty without a predefined goal. All Artificial Life attempts to realize open-ended evolution have failed so far. Agent systems evolve within a designed fitness landscape and never truly discover new fitness dimensions.

Practical consequence: Our agent evolution is always “closed” — bound to pre-defined metrics. It will never spontaneously “discover” a new task type that we didn’t anticipate.

Assessment: True, but less relevant for us. We don’t want open-ended evolution — we want directed improvement within known task types. The argument hits the philosophical superstructure, not the practical application.

Source: Discussion in Frontiers in Ecology and Evolution, 2021

⚠️ MITTEL

Agent-Evolution ist immer „geschlossen" — gebunden an vordefinierte Metriken. Sie erzeugt nie genuine Neuheit. Weniger relevant, da wir gerichtete Verbesserung wollen, nicht open-ended evolution.

Einwand: Nur biologische Systeme erzeugen open-ended evolution — die kontinuierliche Erzeugung genuiner Neuheit ohne vorgegebenes Ziel. Alle Artificial-Life-Versuche, open-ended evolution zu realisieren, sind bisher gescheitert. Agent-Systeme evolvieren innerhalb einer designten Fitness-Landschaft und erzeugen nie wirklich neue Fitness-Dimensionen.

Praktische Konsequenz: Unsere Agent-Evolution ist immer “geschlossen” — gebunden an vorher definierte Metriken. Sie wird nie spontan einen neuen Aufgabentyp “entdecken”, den wir nicht vorhergesehen haben.

Bewertung: Stimmt, ist aber weniger relevant für die Praxis. Das Ziel ist keine open-ended evolution — das Ziel ist gerichtete Verbesserung innerhalb bekannter Aufgabentypen. Das Argument trifft den philosophischen Überbau, nicht die praktische Anwendung.

Quelle: Diskussion in Frontiers in Ecology and Evolution, 2021

G6: Evolutionary Approaches Are Expensive (MEDIUM)G6: Evolutionäre Ansätze sind teuer (MITTEL)

⚠️ MEDIUM

Population-based methods require many costly evaluations. Evolution only pays off above a certain system size — cost-benefit must be tracked.

Objection: Population-based methods (MAP-Elites, EvoFlow) require many evaluations. Each evaluation of an agent workflow costs tokens = money. Monte Carlo Tree Search and similar search methods are “fundamentally bottlenecked by costly execution-based rollouts.” (arXiv:2401.10510)

Practical consequence: EvoFlow’s “12.4% of o1-preview costs” applies to inference — but the training/evolution itself required many evaluation rounds. For a small agent system, the overhead of evolution could exceed its benefits.

Assessment: Justified. Phase 1 (Measuring) is cheap. But Phase 3 (Population Management) becomes expensive. We need to keep the cost-benefit ratio in mind — evolution only pays off above a certain system size.

⚠️ MITTEL

Populationsbasierte Methoden brauchen viele kostenintensive Evaluierungen. Evolution lohnt sich erst ab einer bestimmten Systemgröße — Kosten-Nutzen muss im Blick bleiben.

Einwand: Populationsbasierte Methoden (MAP-Elites, EvoFlow) brauchen viele Evaluierungen. Jede Evaluierung eines Agent-Workflows kostet Token = Geld. Monte Carlo Tree Search und ähnliche Suchverfahren sind “fundamentally bottlenecked by costly execution-based rollouts.” (arXiv:2401.10510)

Praktische Konsequenz: EvoFlow’s “12.4% der Kosten von o1-preview” gilt für die Inferenz — aber das Training/die Evolution selbst kostete viele Evaluierungsrunden. Für ein kleines Agent-System könnte der Overhead der Evolution den Nutzen übersteigen.

Bewertung: Berechtigt. Phase 1 (Messen) ist billig. Aber Phase 3 (Populations-Management) wird teuer. Wir müssen das Kosten-Nutzen-Verhältnis im Auge behalten — Evolution lohnt sich erst ab einer bestimmten Systemgröße.

G7: Darwinian Logic Also Predicts Bad Outcomes (LOW)G7: Darwinistische Logik sagt auch schlechte Outcomes voraus (NIEDRIG)

✅ LOW

Selection favors "fit" agents, not "good" ones — risking deceptive alignment. Mitigated by Human-in-the-Loop and controlled fitness functions.

Objection: If you take the evolution analogy seriously, then selection doesn’t favor “good” agents but “fit” ones — those that survive and reproduce. This can lead to deceptive alignment, power accumulation, and self-serving behavior. (arXiv:2303.16200)

Practical consequence: Less relevant because we have Human-in-the-Loop and design the fitness function ourselves. But it’s a warning: If you automate selection (Phase 4), the fitness function must be robust against Goodhart’s Law.

Assessment: Academically important, low practical relevance (selection is controlled).

✅ NIEDRIG

Selektion favorisiert „fitte" Agents, nicht „gute" — Risiko für deceptive alignment. Durch Human-in-the-Loop und kontrollierte Fitness-Funktionen gemildert.

Einwand: Wenn man die Evolutions-Analogie ernst nimmt, dann favorisiert Selektion nicht “gute” Agents, sondern “fitte” — also solche, die überleben und sich vermehren. Das kann zu deceptive alignment, Machtakkumulation und eigennützigem Verhalten führen. (arXiv:2303.16200)

Praktische Konsequenz: Weniger relevant, weil wir Human-in-the-Loop haben und die Fitness-Funktion selbst designen. Aber es ist eine Warnung: Wenn man die Selektion automatisiert (Phase 4), muss die Fitness-Funktion robust gegen Goodhart’s Law sein.

Bewertung: Akademisch wichtig, praktisch niedrige Relevanz (die Selektion wird kontrolliert).

G8: The Analogy Could Work in Both Directions (META)G8: Die Analogie könnte in beide Richtungen funktionieren (META)

⚠️ META

Evo-Devo argues the reverse direction: not "Nowak → AI" but "Evo-Devo → AI." Opens a new research direction that extends rather than refutes the thesis.

Objection: Erden & Faltings (GECCO 2025) argue the REVERSE: Not “Nowak → AI,” but “Evo-Devo → AI.” They say the Modern Synthesis (Nowak’s framework) has the same limitations as current AI, and Evolutionary Developmental Biology (Evo-Devo) shows the way to better AI.

Practical consequence: Perhaps our mapping is too narrow. Nowak’s selection framework is only ONE part of evolutionary biology. Evo-Devo (how do individuals develop within their lifetime?) could be more relevant than population genetics (how does a population change across generations?).

Assessment: Very interesting — opens a new research direction. Our repo covers the population genetics side but not the Evo-Devo side. Possible extension.

⚠️ META

Evo-Devo argumentiert in die umgekehrte Richtung: Nicht „Nowak → AI", sondern „Evo-Devo → AI." Öffnet eine neue Forschungsrichtung, die die These erweitert statt widerlegt.

Einwand: Erden & Faltings (GECCO 2025) argumentieren UMGEKEHRT: Nicht “Nowak → AI”, sondern “Evo-Devo → AI”. Sie sagen, die Modern Synthesis (Nowaks Rahmen) hat dieselben Limitationen wie aktuelle AI, und Evolutionary Developmental Biology (Evo-Devo) zeigt den Weg zu besserer AI.

Praktische Konsequenz: Vielleicht ist unser Mapping zu eng. Nowaks Selektions-Framework ist nur EIN Teil der Evolutionsbiologie. Evo-Devo (wie entwickeln sich Individuen innerhalb ihrer Lebenszeit?) könnte relevanter sein als Populationsgenetik (wie verändert sich eine Population über Generationen?).

Bewertung: Sehr interessant — öffnet eine neue Forschungsrichtung. Unser Repo deckt die Populationsgenetik-Seite ab, aber nicht die Evo-Devo-Seite. Mögliche Ergänzung.

G9: Population Size — Nowak’s Equations Assume Large Populations (MEDIUM-STRONG)G9: Populationsgröße — Nowaks Gleichungen setzen große Populationen voraus (MITTEL-STARK)

⚔️ MEDIUM-STRONG

Agent systems have 3–10 configurations — far below the ~10⁴ threshold where selection dynamics are meaningful. Genetic drift dominates. Select at execution level, not population level.

Objection: Nowak’s selection equation and quasispecies theory assume statistically significant population sizes. In populations below ~10⁴ individuals, genetic drift dominates over selection — random fluctuations determine which variants survive, not fitness. Agent systems typically have 3–10 configurations — far below any threshold where selection dynamics are statistically meaningful.

Practical consequence: If we have 5 skill variants and one of them “wins,” we don’t know whether that was fitness or chance. The Pareto analysis from Phase 1 needs minimum samples per skill (hence: ≥3 Uses for Pareto View). But even with many executions, the “population” (number of competing variants per task type) remains small.

Assessment: Valid. This is a real problem that weakens the analogy. Countermeasure: Don’t select at population level (too few individuals), but at execution level — many data points per variant instead of many variants. This shifts the problem from population genetics to classical statistics (hypothesis testing).

Source: LaBar, T. & Adami, C. (2016). “Different Evolutionary Paths to Complexity for Small and Large Populations of Digital Organisms.” PLOS Computational Biology — shows that in populations < 10⁴, drift fundamentally changes evolutionary dynamics.

⚔️ MITTEL-STARK

Agent-Systeme haben 3–10 Konfigurationen — weit unter der ~10⁴-Schwelle, ab der Selektionsdynamik statistisch greift. Genetischer Drift dominiert. Auf Ausführungs-Level selektieren, nicht Populations-Level.

Einwand: Nowaks Selektionsgleichung und die Quasispezies-Theorie setzen statistisch signifikante Populationsgrößen voraus. In Populationen unter ~10⁴ Individuen dominiert genetischer Drift über Selektion — zufällige Schwankungen bestimmen, welche Varianten überleben, nicht Fitness. Agent-Systeme haben typisch 3–10 Konfigurationen — das ist weit unter jeder Schwelle, bei der Selektionsdynamik statistisch greift.

Praktische Konsequenz: Wenn wir 5 Skill-Varianten haben und eine davon “gewinnt”, wissen wir nicht ob das Fitness oder Zufall war. Die Pareto-Analyse aus Phase 1 braucht Mindest-Stichproben pro Skill (deshalb: ≥3 Uses für Pareto-View). Aber selbst mit vielen Ausführungen bleibt die “Population” (Anzahl konkurrierender Varianten pro Aufgabentyp) klein.

Bewertung: Trifft zu. Das ist ein echtes Problem, das die Analogie schwächt. Gegenmaßnahme: Nicht auf Populations-Level selektieren (zu wenige Individuen), sondern auf Ausführungs-Level — viele Datenpunkte pro Variante statt vieler Varianten. Das verschiebt das Problem von Populationsgenetik zu klassischer Statistik (Hypothesentests).

Quelle: LaBar, T. & Adami, C. (2016). “Different Evolutionary Paths to Complexity for Small and Large Populations of Digital Organisms.” PLOS Computational Biology — zeigt, dass in Populationen < 10⁴ Drift die Evolutionsdynamik fundamental verändert.

Summary: What Remains of Our Thesis?Zusammenfassung: Was bleibt von unserer These?

Counter-Argument	Strength	Impact on Our Thesis
G1: Lamarckian, not Darwinian	STRONG	Nowak’s results = lower bound, not exact mapping
G2: No true reproduction	STRONG	Distinction “designed” vs. “autonomous” evolution needed
G3: Wishful Mnemonics	MEDIUM	Always use both terms (biological + technical)
G4: Artificial ≠ Natural Selection	STRONG	Fitness function is the most fragile part → possible new principle
G5: Open-Ended Evolution only in biology	MEDIUM	Less relevant for us (we want directed improvement)
G6: Evolutionary approaches are expensive	MEDIUM	Cost-benefit only positive above a certain system size
G7: Selection favors “fit,” not “good”	LOW	Mitigated by Human-in-the-Loop
G8: Evo-Devo instead of Modern Synthesis	META	Opens new direction, extends rather than refutes
G9: Population size too small	MEDIUM-STRONG	Select at execution level, not population level

What the thesis survives:

The structural analogy remains valid as a thinking aid and design heuristic:

Feedback loops as selection pressure → yes, practically useful independent of the analogy
Error Threshold (don’t change more than you can measure) → yes, reinforced by G4 (fitness function fragile)
Diversity as a resource → yes, MAP-Elites/QD works empirically (EvoFlow)
Evolvability before performance → yes, Barnett et al. 2024 confirms biologically

What needs correction:

“Map Nowak’s equation to agent systems” → rather: “Use Nowak’s concepts as thinking aids, with awareness of the breaking points”
New principle P6: “The fitness function is the most fragile part” — artificial selection requires metric robustness
Evo-Devo as a second research direction to include — not just population genetics

New evidence (2026-03-22):

Rethinking the Value of Multi-Agent Workflow (2026) argues that a strong single agent can match multi-agent systems — challenging the assumption that population-based evolution is always superior. This reinforces G6 (cost) and adds nuance: multi-agent may only outperform single-agent above a complexity threshold.
Darwin Gödel Machine (Clune et al., 2025) shows open-ended self-improvement IS achievable — partially addressing G5 (open-ended evolution only in biology).

Gegenargument	Stärke	Auswirkung auf unsere These
G1: Lamarckisch statt Darwinistisch	STARK	Nowaks Ergebnisse = untere Schranke, nicht exakte Abbildung
G2: Keine echte Reproduktion	STARK	Unterscheidung “designed” vs. “autonomous” evolution nötig
G3: Wishful Mnemonics	MITTEL	Immer beide Begriffe nennen (biologisch + technisch)
G4: Künstliche ≠ Natürliche Selektion	STARK	Fitness-Funktion ist der fragilste Teil → mögliches neues Prinzip
G5: Open-Ended Evolution nur in Biologie	MITTEL	Für uns weniger relevant (das Ziel ist gerichtete Verbesserung)
G6: Evolutionäre Ansätze sind teuer	MITTEL	Kosten-Nutzen erst ab bestimmter Systemgröße positiv
G7: Selektion favorisiert “fit”, nicht “gut”	NIEDRIG	Durch Human-in-the-Loop gemildert
G8: Evo-Devo statt Modern Synthesis	META	Öffnet neue Richtung, erweitert statt widerlegt
G9: Populationsgröße zu klein	MITTEL-STARK	Auf Ausführungs-Level selektieren, nicht Populations-Level

Was die These überlebt:

Die strukturelle Analogie bleibt gültig als Denkhilfe und Design-Heuristik:

Feedback-Loops als Selektionsdruck → ja, unabhängig von der Analogie praktisch nützlich
Error Threshold (nicht mehr ändern als messen) → ja, verstärkt durch G4 (Fitness-Funktion fragil)
Diversität als Ressource → ja, MAP-Elites/QD funktioniert empirisch (EvoFlow)
Evolvierbarkeit vor Performance → ja, Barnett et al. 2024 belegt biologisch

Was korrigiert werden muss:

“Nowaks Gleichung auf Agent-Systeme abbilden” → eher: “Nowaks Konzepte als Denkhilfe nutzen, mit Bewusstsein für die Bruchstellen”
Neues Prinzip P6: “Die Fitness-Funktion ist der fragilste Teil” — künstliche Selektion erfordert Metrik-Robustheit
Evo-Devo als zweite Forschungsrichtung aufnehmen — nicht nur Populationsgenetik

Neue Evidenz (22.03.2026):

Rethinking the Value of Multi-Agent Workflow (2026) argumentiert, dass ein starker einzelner Agent Multi-Agent-Systeme erreichen kann — stellt die Annahme in Frage, dass populationsbasierte Evolution immer überlegen ist. Verstärkt G6 (Kosten) und fügt Nuance hinzu: Multi-Agent übertrifft Single-Agent möglicherweise erst ab einem Komplexitäts-Threshold.
Darwin Gödel Machine (Clune et al., 2025) zeigt, dass offene Selbstverbesserung erreichbar IST — adressiert teilweise G5 (Open-Ended Evolution nur in der Biologie).

SourcesQuellen

AI & Society: Darwin in the Machine — Algorithmic Individuation through Evolutionary Narratives
Science: The Metaphors of Artificial Intelligence
Frontiers: How Organisms Come to Know the World — Limits on AGI
Erden & Faltings: On the Parallels Between Evolutionary Theory and AI (GECCO 2025)
Natural Selection Favors AIs over Humans
When LLMs Meet Evolutionary Algorithms
Evolutionary Perspectives on LLM-Based AI Agents
McDermott, D. (1981). “Artificial Intelligence Meets Natural Stupidity.” Mind Design.