
A research team in Singapore just demonstrated something that shouldn’t be possible according to the dominant paradigm in artificial intelligence.
Their model uses 27 million parameters. That’s four times smaller than GPT-1, the original model from 2018. It trained on just 1,000 examples without any pre-training. And it outperformed models with hundreds of billions of parameters on reasoning tasks that have stumped the industry’s most advanced systems.
The model is called the Hierarchical Reasoning Model (HRM), developed by Sapient Intelligence. On the Abstraction and Reasoning Corpus—widely considered the gold standard for measuring artificial general intelligence capabilities—HRM achieves 5% performance. That beats OpenAI’s o3-mini-high, DeepSeek’s R1, and Claude 3.5 8K.
This isn’t just an incremental improvement. It’s a fundamental challenge to the assumption that has driven AI development for the past five years: that progress requires ever-larger models consuming ever-more computational resources.
The Verbal Reasoning Trap
Current language models approach reasoning through a method called Chain-of-Thought prompting. The model articulates every step of its reasoning process in explicit language, creating a verbal chain from problem to solution.
This approach has become so dominant that it’s rarely questioned. But it carries hidden costs.
Guan Wang, Sapient’s CEO, points to research showing that “CoT does not genuinely reflect a model’s internal reasoning.” Models can produce correct answers with incorrect reasoning steps. They can also produce incorrect answers with seemingly logical reasoning chains.
A 2025 Wharton study confirms the brittleness. Chain-of-Thought prompting increases processing time by 20-80% while delivering only marginal improvements in accuracy for reasoning-optimized models. Non-reasoning models show modest average gains but with increased variability in answers.
The fundamental issue is this: forcing every reasoning step into language creates inefficiencies similar to requiring a chess grandmaster to verbalize every calculation before making a move.
Your brain doesn’t work that way. When you recognize a face, catch a ball, or solve a familiar problem, you don’t articulate each sub-process. Lower-level operations run automatically while higher-level processes handle strategic decisions.
The Architecture of Thought
HRM mimics this biological structure through a two-tier system.
A high-level module performs slow, abstract planning—the strategic layer that decides what needs to happen. A low-level module executes rapid, detailed computations—the operational layer that handles execution.
This maps directly to Daniel Kahneman’s dual-process theory. System 1 thinking is fast, automatic, and intuitive. System 2 thinking is slow, deliberate, and analytical. HRM alternates dynamically between these modes in a single forward pass.
The human brain processes information hierarchically across cortical regions operating at different timescales. Sensory inputs process at lower levels. Abstract concepts form at higher levels. Recurrent feedback loops iteratively refine internal representations, allowing slower, higher-level areas to guide faster, lower-level circuits while preserving global coherence.
HRM translates this biological blueprint into computational architecture.
The results speak to efficiency at a scale that reshapes what’s possible. Wang estimates potential 100x speedups versus Chain-of-Thought approaches for suitable tasks. The model can be trained to solve professional-level Sudoku in just two GPU hours. It runs on standard CPUs with under 200MB of RAM.
That’s a fraction of what today’s large language models require.
The Data Efficiency Paradigm
The broader AI industry has been moving toward this conclusion from a different direction.
By 2024, small models around 3.8 billion parameters reached scores above 60% on MMLU—a benchmark that previously required models with 540 billion parameters. Studies show student models can achieve 90-95% of teacher model performance while being significantly smaller.
Nvidia concluded that small-language models could perform 70-80% of enterprise tasks, leaving the most complex reasoning to large-scale systems. A two-tier structure is emerging: small models for volume, large models for complexity.
HRM takes this principle to its logical extreme. It learns from just 1,000 examples what other systems require millions or billions of data points to approximate.
This has immediate economic implications. Companies can reduce infrastructure costs by orders of magnitude while maintaining performance. The computational requirements for advanced AI drop by several orders of magnitude.
That democratizes access to sophisticated AI capabilities. It potentially disrupts the current oligopoly of companies with enough capital to train frontier models.
The Critical Nuance
HRM isn’t a universal solution. The architecture excels at a specific class of problems: structured, grid-based puzzles with clear rules and limited scope.
The ARC Prize team found that HRM’s hierarchical brain-inspired architecture doesn’t consistently beat similarly sized transformer models. Standard transformers achieved nearly identical performance with no special optimization.
This matters because it clarifies what HRM actually proves.
The value lies not in the hierarchical architecture being universally superior. The value lies in demonstrating that architectural innovation can match parameter scale for specific reasoning domains.
That challenges the “bigger is always better” orthodoxy without claiming universal superiority.
HRM works exceptionally well for tasks with clear structure and definable rules. It struggles with open-ended problems requiring broad world knowledge or nuanced language understanding. Large language models excel at those tasks precisely because they’ve absorbed patterns from vast text corpora.
The question isn’t which approach is better. The question is which approach fits which problem.
The Implications Beyond AI
The principle HRM demonstrates extends beyond artificial intelligence.
Hierarchical processing offers a universal problem-solving architecture. Strategic planning happens at one level. Operational execution happens at another. Feedback loops connect them.
You see this in organizational design. Effective companies separate strategic decision-making from operational execution while maintaining clear communication channels between levels.
You see this in decision-making frameworks. High-level goals guide low-level actions. Results feed back to refine strategy.
The human brain evolved this structure because it works. It balances flexibility with efficiency. It enables both rapid response and thoughtful deliberation.
Current AI architectures largely ignore this biological blueprint in favor of scaling a single approach. HRM suggests that’s leaving significant capability on the table.
The Path Forward
HRM’s success points toward a future where AI development fragments into specialized architectures optimized for different problem classes.
Large language models will continue advancing for tasks requiring broad knowledge and language understanding. Hierarchical models like HRM will handle structured reasoning problems with dramatically better efficiency. Hybrid systems will combine both approaches.
The computational efficiency HRM demonstrates enables applications currently constrained by cloud limitations. Edge computing becomes viable for sophisticated reasoning tasks. Robotics and autonomous systems gain capabilities they couldn’t access before.
This shift also raises questions about interpretability. Chain-of-Thought prompting provides a verbal trace of reasoning steps. HRM’s internal processes remain largely opaque, even as its outputs prove reliable.
Does true understanding require language? Or is transparency theater more dangerous than acknowledged opacity?
The debate matters because it shapes how we evaluate and trust AI systems.
What This Means Now
The immediate takeaway is practical: the assumption that AI progress requires ever-larger models is demonstrably false for certain problem domains.
Organizations evaluating AI implementations should consider whether their use cases involve structured reasoning problems. If so, smaller specialized models may outperform larger general-purpose systems while consuming a fraction of the resources.
Researchers should explore architectural innovations rather than focusing exclusively on scaling existing approaches. The low-hanging fruit in AI may not be in the next order of magnitude of parameters.
The broader implication touches on how we think about intelligence itself. Human cognition doesn’t rely on verbalizing every step. It integrates multiple processing modes operating at different speeds and levels of abstraction.
AI systems that mirror this structure may unlock capabilities that pure scaling cannot reach.
HRM represents one data point in a larger pattern. The pattern suggests that the future of AI looks less like a single massive model and more like an ecosystem of specialized architectures, each optimized for different aspects of intelligence.
That future is more accessible, more efficient, and potentially more aligned with how intelligence actually works in biological systems.
The cosmos of artificial intelligence is vaster than a single scaling law can describe. We’re just beginning to explore its structure.

Leave a comment