By Dr. Reza Olfati-Saber
Abstract
The Prime Number Theorem took 104 years to prove—not because mathematicians lacked intelligence or computational power, but because the solution required complex analysis, a mathematical framework that didn’t exist when Gauss first observed the pattern. This historical case study reveals three critical risks in current AGI research strategy: scaling potentially insufficient architectures, systematically underinvesting in orthogonal research domains, and creating infrastructure lock-in that inhibits paradigm shifts. Drawing on three decades of research experience from multi-agent systems to the current AI boom, I argue that the field faces a strategic choice: continue betting everything on transformer scaling, or adopt a diversified portfolio approach that acknowledges fundamental uncertainty about paths to general intelligence. Recent evidence—including findings that 95% of enterprise AI deployments fail despite impressive benchmarks—suggests we may be optimizing for the wrong metrics. This article proposes concrete strategies for research diversification and critiques the “singular genius” attribution model that distorts incentives and obscures the collaborative nature of transformative breakthroughs.
Introduction: Two Journeys in Pattern Recognition
In 1792, fifteen-year-old Carl Friedrich Gauss began obsessively counting prime numbers. Through meticulous tabulation, he discovered a remarkable pattern: prime density decreases predictably, proportional to the reciprocal of the natural logarithm. In his 1849 letter to astronomer Johann Encke, Gauss recalled that his attention to the problem dated back to 1792 or 1793.
He had found the pattern. He couldn’t prove why it was true.
The proof would take 104 years and require the invention of entirely new mathematical frameworks. When Jacques Hadamard and Charles-Jean de la Vallée Poussin independently proved the Prime Number Theorem in 1896, they used complex analysis—mathematics of functions on the complex plane that didn’t exist when Gauss made his conjecture.
Fast forward to 2025. We’ve built systems that generate human-like text, recognize images with superhuman accuracy, and demonstrate remarkable few-shot learning capabilities. We see impressive patterns in scaling laws. We measure emergent abilities at increasing model sizes. We’re investing billions in infrastructure to scale transformers larger.
But like Gauss, we see patterns without understanding mechanisms. And like 19th-century mathematicians, we face a critical strategic choice: Should we invest everything in scaling our current approach, or might we need conceptual frameworks that don’t yet exist?
By AGI, I mean “The Future of Machine Intelligence.”
This article examines what the Prime Number Theorem’s century-long journey teaches us about strategic risks in contemporary AGI research—and why the answers matter urgently for resource allocation decisions being made today.
The Historical Case: 104 Years from Pattern to Proof
The Observation Phase (1792-1859)
Gauss’s work was pure empirical pattern recognition—systematic data collection revealing statistical regularities without theoretical foundation. His conjecture could be stated formally as π(x) ~ x/ln(x), where π(x) counts primes less than x.
In 1798, Adrien-Marie Legendre refined this with π(x) ≈ x/(ln(x) + B) where B ≈ -1.08366. Legendre’s formula fit available data beautifully—for every range anyone could compute, it worked impressively well.
There was just one problem: it was asymptotically wrong. As x approaches infinity, Legendre’s formula diverges from the true distribution. Yet without the mathematical tools to prove this, it stood as the best available model for decades.
This should sound familiar to anyone following contemporary AI research.
The Paradigm Shift (1859)
Bernhard Riemann changed everything with an eight-page paper that connected prime distribution to zeros of the zeta function ζ(s) = Σ(1/n^s) through complex analysis. He didn’t improve Gauss’s counting methods or refine Legendre’s approximation. He completely reframed the problem using mathematics from an entirely different domain.
Who in 1792 would have predicted that understanding how integers are distributed requires studying functions of imaginary numbers? The connection was far from obvious. But that’s precisely the point: transformative solutions often come from orthogonal domains.
Pafnuty Chebyshev in the 1850s proved partial results showing that if the limit exists, it must equal 1—narrowing the solution space without completing the proof. This represents another important pattern: “failed” or incomplete approaches that nonetheless advance understanding by eliminating possibilities.
The Completion Phase (1896)
Hadamard and de la Vallée Poussin independently proved the theorem in 1896 by establishing that ζ(s) ≠ 0 for Re(s) = 1. The fact that two mathematicians completed the proof independently in the same year is significant: it suggests the mathematical infrastructure had reached critical density for the breakthrough. The discovery was, in a sense, inevitable—the question was who would complete it first, not whether someone’s unique insight would enable it.
Key timeline: 104 years from observation to rigorous proof.
Three Strategic Risks in Current AGI Research
Risk 1: Scaling the Wrong Formulation
Legendre’s formula demonstrates a critical failure mode: impressive empirical performance within tested ranges does not guarantee asymptotic correctness. His approximation justified confidence—it worked everywhere it could be checked—yet remained fundamentally insufficient.
Contemporary transformers may present an analogous situation. They demonstrate remarkable capabilities through autoregressive next-token prediction with attention mechanisms: few-shot learning, chain-of-thought reasoning, emergent abilities at scale. Benchmark performance continues improving with model size and data.
But impressive benchmark performance doesn’t guarantee architectural sufficiency for general intelligence. Multiple concerns warrant serious consideration:
The Deployment Gap. MIT’s 2025 comprehensive study of enterprise AI implementations analyzed 300 deployments through 150 executive interviews and 350 employee surveys. The finding: 95% of generative AI pilots fail to deliver measurable financial impact. Critically, the failure mode wasn’t insufficient model capability—it was fundamental mismatch between what systems do and what real workflows require (Challapally et al., 2025).
Systems performing beautifully on benchmarks consistently fail in production environments. This echoes Legendre: excellent performance on test cases, fundamental limitations in actual deployment.
Theoretical Questions. The autoregressive next-token prediction paradigm faces potential fundamental limits:
- Can backpropagation through transformer architectures adequately handle credit assignment for complex, long-range reasoning?
- Do pattern completion mechanisms suffice for systematic compositional understanding?
- Are current data scales compatible with human-like generalization from limited examples?
These remain open questions, not established limitations. But the Legendre precedent urges caution in assuming benchmark performance guarantees asymptotic sufficiency.
The Infrastructure Bet. We’re committing billions to transformer-specific infrastructure: training clusters optimized for attention operations, ASICs designed for specific architectural patterns, power generation scaled for particular computational requirements. These investments create powerful path dependencies—institutional pressures to continue current approaches even when evidence suggests limitations.
If transformers prove insufficient for AGI, we’ll have locked enormous capital in depreciating, architecture-specific assets while crowding out alternative research. It’s analogous to building the world’s most sophisticated abacus manufacturing infrastructure in 1950 rather than investigating electronic computation.
Risk 2: Missing Orthogonal Breakthroughs
Complex analysis seemed completely irrelevant to counting integers. Yet it provided the framework necessary for proof. This orthogonality is the crucial lesson: transformative frameworks often emerge from domains considered tangential to the original problem.
Who today can confidently identify which domains will provide the conceptual frameworks necessary for AGI? Several candidates warrant serious consideration:
Neuroscience and Biological Systems. Mechanisms by which biological neural networks implement learning, memory, and generalization may differ fundamentally from backpropagation through static architectures. Understanding these biological principles could reveal alternative computational approaches.
Physics and Thermodynamics. Physical constraints on computation, energy efficiency requirements, and potential quantum effects in biological cognition may inform fundamental requirements for intelligent systems.
Dynamical Systems Theory. Conceptualizing intelligence as emergent behavior from complex adaptive systems suggests frameworks based on stability, convergence, and phase transitions rather than optimization of fixed objectives. My own work on consensus in multi-agent systems (Olfati-Saber & Murray, 2004; Olfati-Saber, 2006) exemplifies this: collective intelligence emerging from interaction protocols rather than individual agent sophistication—a principle now reflected in Graph Neural Network architectures.
Statistical Physics. Understanding emergence, collective behavior, and phase transitions in many-particle systems provides natural frameworks for how macroscopic intelligence emerges from microscopic interactions.
Category Theory. Abstract mathematical frameworks for composition, transformation, and invariance might formalize properties of “understanding” currently captured only informally.
Current resource allocation heavily favors transformer scaling over orthogonal exploration. While precise figures are difficult to obtain, publication patterns and funding announcements suggest >70% of resources target scaling existing architectures, with <10% allocated to fundamentally alternative paradigms.
This allocation reflects rational response to near-term incentives but may prove strategically suboptimal if AGI requires frameworks not yet developed. We risk being so focused on counting primes faster that we miss the development of complex analysis happening elsewhere.
Risk 3: Infrastructure Lock-In and Path Dependencies
Multi-billion dollar infrastructure commitments create powerful forces resisting paradigm shifts:
- Training clusters optimized for current architectures
- ASICs designed for specific operations
- Power infrastructure sized for particular requirements
- Workforce trained on current paradigms
- Organizational structures built around existing approaches
These aren’t just sunk costs—they’re active pressures maintaining current trajectories even when evidence suggests fundamental limitations. Infrastructure investment becomes an obstacle to necessary change.
Historical precedent is instructive. Imagine if 1850s mathematicians had invested massively in mechanical calculators, hired thousands of human computers, built enormous tabulation operations—all optimized for computing Legendre’s formula more precisely. The infrastructure investment itself would have created resistance to Riemann’s paradigm shift toward complex analysis.
Today’s infrastructure commitments should scale with confidence about architectural sufficiency. Given fundamental uncertainty about whether transformers represent viable paths to general intelligence, massive investment before establishing theoretical foundations represents strategic risk.
The Attribution Problem: Why “Godfathers” Harm Science
Who is the “godfather” of the Prime Number Theorem? The question is nonsensical. The proof required:
- Gauss’s observation (1792)
- Legendre’s refinement (1798)
- Riemann’s reframing (1859)
- Chebyshev’s partial results (1850s)
- Hadamard and de la Vallée Poussin’s completion (1896)
Each contribution was essential; none alone sufficed. “Godfather” attribution fundamentally misunderstands how knowledge accumulates.
Yet AI discourse persistently seeks singular originators—individual “godfathers” credited as creators. This framework creates concrete harms:
Perverse Incentives
Credit hoarding: Why share preliminary results if recognition accrues only to “first” discoverers?
Reduced collaboration: Why work jointly if credit dilutes through cooperation?
Publication timing games: Racing to establish priority encourages premature publication over thorough development.
Reproducibility degradation: Reduced incentives to provide implementation details that enable replication.
The bitter Erdős-Selberg priority dispute over the 1948 elementary proof demonstrates how attribution battles damage both research and relationships (Goldfeld, n.d.).
Obscured Dependencies
Hadamard and de la Vallée Poussin couldn’t have proved the theorem without Riemann’s 1859 framework. Yet “godfather” narratives might credit Gauss (first conjecture) or Hadamard (first proof), erasing Riemann’s enabling contribution.
In AI, celebrating individual “godfathers” obscures researchers who developed backpropagation, normalization techniques, regularization methods, attention mechanisms, optimization algorithms, datasets, evaluation frameworks, and hundreds of incremental improvements. Every celebrated breakthrough builds on infrastructure created by others.
Systematic Undervaluation
“Godfather” frameworks celebrate only “correct” final contributions, erasing:
- Negative results eliminating dead ends
- Failed approaches clarifying constraints
- Incremental improvements enabling breakthroughs
- Infrastructure work making research possible
- Cross-disciplinary bridge-building
Legendre’s asymptotically wrong formula still narrowed the solution space. AI’s “winters” taught crucial lessons. Failed architectures eliminated possibilities. But singular genius narratives make these essential contributions invisible.
Case Study: The Systematic Under-Citation of Shun-Ichi Amari
The prevalent failure to cite Shun-Ichi Amari’s foundational contributions exemplifies how attribution failures harm cumulative science. Consider Amari’s track record:
- 1967: Proposed first deep learning approach using stochastic gradient descent for training multilayer perceptrons—decades before the “deep learning revolution”
- 1972: Self-organizing networks (independently inventing what became the Hopfield network)
- 1977: Theory of lateral inhibition in neural fields
- 1998: Natural gradient method accounting for parameter space geometry
- 2000: Information geometry framework (with Nagaoka) creating an entirely new field
These contributions underpin modern machine learning. Yet they remain systematically under-cited in Western literature—often rediscovered without attribution. This erasure, whether from linguistic barriers, geographic bias, or ignorance, perpetuates selective credit assignment that undermines cumulative science.
When I cite Miroslav Fiedler’s 1973 work on algebraic connectivity of graphs in my consensus papers, I acknowledge that my contributions stand on his shoulders. This doesn’t diminish my work—it accurately represents how science progresses.
Proper attribution isn’t academic courtesy. It’s infrastructure for cumulative knowledge building.
A Portfolio Strategy for AGI Research
Given fundamental uncertainty about architectural sufficiency, I propose a portfolio approach balancing competing objectives:
Empirical Scaling (30-40% of resources)
Continue scaling current architectures to understand capabilities and limits. This generates valuable data, delivers near-term applications, clarifies what current paradigms can and cannot achieve. Like Gauss’s counting: necessary groundwork without guaranteed sufficiency.
Rationale: We need to understand transformer limits empirically. Stopping scaling prematurely would leave critical questions unanswered. But this shouldn’t consume all resources.
Fundamental Research (30-40% of resources)
Invest in theoretical understanding of current systems: Why do transformers work? What are their information-theoretic limits? What can they provably not learn? And invest in alternative conceptual frameworks, including work appearing “irrelevant” to current approaches.
Rationale: Complex analysis seemed irrelevant to counting primes. The frameworks enabling AGI may currently appear unrelated to machine learning. Support curiosity-driven research without demanding immediate application.
Alternative Architectures (20-30% of resources)
Seriously investigate neuromorphic computing, quantum approaches, hybrid symbolic-neural systems, biologically-inspired architectures, and paradigms not yet conceived. Maintain optionality rather than committing entirely to one approach.
Rationale: If transformers prove insufficient, having viable alternatives prevents starting from scratch. Portfolio diversification is basic risk management.
The key insight: We don’t know whether we’re in an engineering phase (correct approach requiring scale) or a science phase (missing fundamental conceptual frameworks). The 104-year gap from conjecture to proof suggests betting entirely on either alone is strategically unsound.
Beyond AGI: The Case for GI and SI
Rather than exclusively pursuing AGI, I recommend increased investment in:
General Intelligence (GI): Systems operating effectively across multiple related domains without requiring human-level general intelligence. Medical diagnostic systems transferring across radiology, pathology, and genomics. Materials discovery spanning chemistry, physics, and engineering. Multi-domain robots adapting to related tasks.
Specialized Intelligence (SI): Systems achieving superhuman performance in well-defined domains. AlphaFold for protein structure. Specialized forecasting systems. Mathematical theorem proving assistants. Domain-specific code generation.
These aren’t consolation prizes awaiting “real” AGI—they’re valuable endpoints that build foundational understanding. They generate empirical data about learning and generalization while delivering immediate utility.
This parallels mathematics using Legendre’s approximation productively while continuing theoretical work toward rigorous proof, rather than refusing partial solutions while awaiting perfect understanding.
Practical Recommendations for Research Leaders
1. Diversify Research Portfolios
Don’t concentrate all resources on scaling current architectures. Allocate significant effort to theoretical understanding, alternative paradigms, and cross-disciplinary exploration.
When I moved from consensus problems to flocking algorithms in my multi-agent systems research, it wasn’t because consensus was “solved”—exploring adjacent problem spaces revealed insights applicable to both domains.
2. Document Failures Comprehensively
Your “failed” architecture might eliminate dead ends for others or contain insights valuable under different framings. Gauss’s counting tables remained useful for 100 years. Comprehensive failure documentation serves future researchers.
3. Invest in Precise Problem Formulation
Gauss’s conjecture was itself a breakthrough. Spend serious effort clarifying objectives. “Build AGI” is insufficiently precise. “Create systems learning task X from Y data with Z sample efficiency under constraint W” enables rigorous evaluation and cumulative progress.
4. Build Disciplinary Bridges
The breakthrough came from complex analysis, not number theory. AGI may require insights from neuroscience, physics, mathematics, or fields currently considered irrelevant. The future of AI needs polymaths and collaborative specialists who can bridge disciplines, not narrow experts working in isolation behind walls of competitive secrecy.
5. Design Collaborative Incentive Structures
Science is collaborative across generations. Create systems rewarding proper attribution, infrastructure contributions, negative result publication, and collaborative work—not only “first” discoverers. The infrastructure you enable might be used by someone building on your work decades hence.
6. Maintain Epistemic Humility in Planning
We don’t know if current approaches suffice, what breakthroughs are needed, or when they’ll arrive. Strategy should reflect uncertainty through diversification and optionality. The 104-year gap teaches humility about timeline estimation.
7. Scale Infrastructure Commitments to Certainty
Before massive architecture-specific investment, seriously evaluate whether we’re in engineering phase (scale the right approach) or science phase (missing fundamental concepts). Premature lock-in creates path dependencies inhibiting necessary shifts.
Personal Reflection: When the Audience Laughs
When I first presented my consensus work at a premier IEEE control conference in the early 2000s, the audience laughed. The session chair joked: “Reza, I really have no idea why you bother to write a paper on linear systems?!”
The work looked deceptively simple: how to get agents—robots, sensors, computational nodes—to agree on something when they can only communicate with neighbors. No central controller. No global view. Just local interactions producing global coordination.
A few years later, that session chair apologized—five of his PhD students were working on consensus theory. Those papers have now been cited tens of thousands of times because they built stepping stones for today’s progress in networked dynamic systems: power grids, the internet, distributed computing, flocks, swarms, social networks, biological systems.
The most important problems often look trivial until you solve them. Birds achieve coordinated flocking with tiny brains and no calculus. We needed Lyapunov functions and carefully designed potential fields. That doesn’t mean birds need our mathematics—just as it doesn’t mean the brain is a transformer. It means we needed mathematical tools to understand, prove, and reliably engineer what nature achieves through other means.
What struck me then—and matters profoundly for AGI now—is that intelligence emerged from interaction protocols, not from sophisticated individual agents. Simple agents with appropriate interaction rules produced complex, adaptive, intelligent-seeming behavior. This is very similar to how Graph Neural Networks operate today—intelligence emerges from message passing and aggregation across network structure, not just from sophisticated individual node computations. The architecture and protocol—how information flows, how agents influence each other, what gets aggregated and when—matter in emergence of collective intelligence.
This is the opposite of the current AGI strategy: scaling individual model sophistication rather than exploring interaction architectures.
Conclusion: Science as Multigenerational Collaboration
The Prime Number Theorem required both Gauss’s empirical work and Riemann’s theoretical reframing. Neither alone sufficed. Mathematicians in 2025 build on Newton, Euler, Gauss, Riemann, and countless others.
The researchers achieving genuine machine intelligence—whenever that occurs—will similarly build on generations of work, with essential contributions from:
- Today’s “failures” that eliminate dead ends and clarify constraints
- Currently undervalued scientific infrastructure builders (datasets, benchmarks, evaluation frameworks, theoretical tools)
- Disciplines we don’t yet recognize as relevant
- Future researchers we’re currently training
Our role isn’t to be singular “godfathers” who single-handedly create AGI. Our role is to:
- Contribute meaningfully to a multigenerational research program
- Document work thoroughly for future researchers
- Maintain intellectual honesty about uncertainty
- Build infrastructure enabling breakthroughs we can’t yet imagine
The gap between laboratory benchmark success and real-world deployment failure—95% of enterprise implementations delivering no measurable value—echoes Legendre’s formula: impressive performance on test cases doesn’t guarantee broader sufficiency. When billions in investment consistently fail to translate benchmark performance into production value, we should question whether we’re solving the right problem.
The question isn’t “when will we achieve AGI?” or “who will be the godfather of AGI?”
The question is: Do we have the intellectual humility to acknowledge uncertainty, the strategic wisdom to diversify approaches, the collaborative capacity to build on others’ work across generations, and the patience to invest in fundamental understanding even when scaling offers quicker publications?
History suggests transformative breakthroughs come not from doubling down on single approaches, but from maintaining diverse portfolios, building theoretical foundations, enabling contributions from orthogonal fields, and recognizing that science is fundamentally collaborative across time.
That’s what the Prime Number Theorem teaches us. That’s what my three decades in research have taught me. That’s what AGI research demands.
Sidebar: Key Takeaways for Research Strategy
Historical Lesson: The Prime Number Theorem took 104 years because the necessary framework (complex analysis) didn’t exist when the problem was posed—not because mathematicians lacked intelligence or computational power.
Three Strategic Risks:
- Scaling potentially insufficient formulations (Legendre’s asymptotically wrong but empirically impressive formula)
- Missing orthogonal breakthroughs (complex analysis seemed irrelevant to counting integers)
- Infrastructure lock-in creating path dependencies (multi-billion dollar bets on specific architectures)
Portfolio Allocation:
- 30-40%: Empirical scaling of current approaches
- 30-40%: Fundamental theoretical research
- 20-30%: Alternative architectures and paradigms
Evidence for Caution: 95% of enterprise AI implementations fail to deliver value despite benchmark performance—suggesting current architectures may be local optima rather than paths to general intelligence.
Collaborative Framework: Science progresses through multigenerational collaboration, not singular genius. Proper attribution, infrastructure contributions, and documented failures all serve future breakthroughs.
References
Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers, EC-16(3), 299-307.
Amari, S. (1972). Learning patterns and pattern sequences by self-organizing nets of threshold elements. IEEE Transactions on Computers, C-21(11), 1197-1206.
Amari, S. (1977). Dynamics of pattern formation in lateral-inhibition type neural fields. Biological Cybernetics, 27(2), 77-87.
Amari, S. (1998). Natural gradient works efficiently in learning. Neural Computation, 10(2), 251-276.
Amari, S., & Nagaoka, H. (2000). Methods of information geometry. American Mathematical Society.
Challapally, A., et al. (2025). The GenAI Divide: State of AI in Business 2025. MIT NANDA Initiative.
Clay Mathematics Institute (2005). Riemann’s 1859 manuscript. https://www.claymath.org/collections/riemanns-1859-manuscript/
de la Vallée Poussin, C. J. (1896). Recherches analytiques sur la théorie des nombres premiers. Annales de la Société Scientifique de Bruxelles, 20, 183-256.
Fiedler, M. (1973). Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23(98), 298-305.
Goldfeld, D. (n.d.). The elementary proof of the prime number theorem: An historical perspective. Columbia University.
Goldstein, L. J. (1973). A history of the prime number theorem. The American Mathematical Monthly, 80(6), 599-615.
Hadamard, J. (1896). Sur la distribution des zéros de la fonction ζ(s) et ses conséquences arithmétiques. Bulletin de la Société Mathématique de France, 24, 199-220.
Olfati-Saber, R., & Murray, R. M. (2004). Consensus problems in networks of agents with switching topology and time-delays. IEEE Transactions on Automatic Control, 49(9), 1520-1533.
Olfati-Saber, R. (2006). Flocking for multi-agent dynamic systems: Algorithms and theory. IEEE Transactions on Automatic Control, 51(3), 401-420.
About the Author
Reza Olfati-Saber has conducted research in control theory, multi-agent systems, and distributed intelligence for three decades. His two seminal papers on consensus protocols and flocking algorithms established foundational frameworks for understanding collective intelligence in networked systems. His background spans electrical engineering, computer science, systems theory, applied mathematics, biology, and statistical physics. His current interests include distributed AI, human-AI collaboration, and AI governance in regulated industries.
Leave a Reply