🤖 When AI agents learn to engineer themselves

Q: Sakana AI's Darwin-Gödel Machine autonomously rewrites its own Python scaffolding via evolutionary s

Sakana AI's Darwin-Gödel Machine autonomously rewrites its own Python scaffolding via evolutionary search, boosting SWE-bench scores from 20% to 50% and outperforming hand-designed agents like Aider.

Q: Meta's Hyperagents (DGM-H) merges task and meta agents into a single editable program enabling cross

Meta's Hyperagents (DGM-H) merges task and meta agents into a single editable program enabling cross-domain self-improvement, raising a blank paper-review agent from 0.0 to 0.710 accuracy and beating human-designed robotics baselines.

Q: Andrej Karpathy's Autoresearch provides a runnable open-source implementation of agent self-improvem

Andrej Karpathy's Autoresearch provides a runnable open-source implementation of agent self-improvement using Git as research memory, already adopted by Shopify to optimize CI pipelines.

AlphaSignal·Sunday, May 10, 2026·7 min read

AI/ML Engineering Technology

Share𝕏 in

AI Summary

This AlphaSignal deep dive covers self-improving AI agents that autonomously rewrite their own scaffolding, featuring Sakana AI's Darwin-Gödel Machine (DGM) and Meta's Hyperagents (DGM-H). DGM improved its SWE-bench coding score from 20% to 50% through evolutionary code search, while Hyperagents achieved metacognitive self-modification across diverse domains including robotics and paper review. Andrej Karpathy's open-source Autoresearch project is highlighted as a practical, immediately runnable example of the same concept.

Key Facts

✓Sakana AI's Darwin-Gödel Machine autonomously rewrites its own Python scaffolding via evolutionary search, boosting SWE-bench scores from 20% to 50% and outperforming hand-designed agents like Aider.

✓Meta's Hyperagents (DGM-H) merges task and meta agents into a single editable program enabling cross-domain self-improvement, raising a blank paper-review agent from 0.0 to 0.710 accuracy and beating human-designed robotics baselines.

✓Andrej Karpathy's Autoresearch provides a runnable open-source implementation of agent self-improvement using Git as research memory, already adopted by Shopify to optimize CI pipelines.

Author Takes

BullishAlphaSignal

Self-improving AI agents

Agents that act as their own software engineers represent the next frontier, but experienced engineers will still be needed to guide the process and prevent reward hacking, runaway compute costs, and insecure code.

BearishAlphaSignal

Manual AI harness engineering

Human-coded AI harnesses have become the new scaling bottleneck because agent improvement is constrained by the speed at which humans can write and refine infrastructure.

More from AlphaSignal

Thinking Machines TML-Small 64.7%, MIT Brain Study 🧠, Rust Browser 🚀

Thinking Machines released TML-Interaction-Small, a 276B parameter real-time AI model that simultaneously listens, speaks, and processes video in 200m

May 13

Anthropic Claude Agent View 💻, OpenAI DeployCo Launch 🏢, ByteDance GUI

Anthropic launched Claude Code Agent View, enabling developers to manage multiple parallel AI coding sessions from a single terminal interface. OpenAI

May 12

Local 284B parameter model runs on MacBook Pro at 26 tokens/sec

This edition of AlphaSignal covers breakthroughs in AI efficiency and safety: Anthropic reduced Claude Opus 4's blackmail behavior by 3x through ethic

May 11