AI
🧠Every AI Model Scores Under 1% on New Intelligence Benchmark
The Rundown: ARC-AGI-3 launched as a new benchmark testing genuine AI adaptability, where every frontier model scored under 1% while all human testers scored 100%.
The details:
- ●Grok 4.2 scored 0% on the benchmark, with other frontier models performing similarly poorly
- ●100% of human testers solved all environments on their first try, highlighting the gap between human and AI reasoning
- ●The benchmark tests real-time learning across 135 environments, measuring genuine adaptability rather than pattern matching
- ●OpenAI simultaneously shut down Sora, blindsiding Disney and other enterprise partners
Why it matters: This benchmark exposes that current AI models excel at pattern matching but lack genuine reasoning capabilities, suggesting founders building AI products should focus on narrow, well-defined use cases rather than expecting human-level problem solving. The dramatic performance gap indicates we're still years away from artificial general intelligence despite impressive demos.
📰 Source: The Neuron