Back to Blog
ResearchMarch 20, 20268 min read

Mathematics, AI, and Formalization: The State of Play

From autoregressive token generators to formally verified proofs: a brief history of computer-aided mathematics

For those who have been following the progress of AI in mathematics, the last few months have been exciting. While LLMs have long since passed the capabilities of a talented college student, these systems have recently turned a corner: scoring top marks at the world's hardest math contests to cracking previously unsolved conjectures [12–18], all with minimal human oversight.

But how did we get from autoregressive token generators to systems that can autonomously expand the mathematical frontier? This article traces a brief arc through the ages of computer-aided mathematics, putting the current wave of progress into context.


Natural Numbers to Regularity Structures

For most of its history, mathematics was grounded in the concrete. From the ancient study of geometry in Greece and arithmetic in Mesopotamia, to Newton's Principia - demonstrating how a single mathematical framework could unify terrestrial and celestial mechanics - the discipline evolved primarily as a means of describing and predicting the physical world [1].

During the nineteenth century, this changed. Mathematics expanded further into abstraction: non-Euclidean geometry, rigorous analysis, abstract algebra, and set theory all emerged in quick succession [2]. This expansion made mathematics extraordinarily powerful, but also increasingly difficult to survey as a whole.

As the frontier grew and proofs came to rely on ever-longer chains of abstraction, verifying correctness became a challenge in its own right [3]. Modern mathematical research is now long-horizon work performed and checked by teams of specialists, where often no individual is able to reliably verify every step.


A Formal Perspective

Formal verification addresses this challenge directly. When proofs are written in formal languages using tools known as Interactive Theorem Provers, their correctness can be verified mechanically by a small, trusted kernel of code [4].

Foremost among these tools is Lean, a system around which contemporary mathematical formalization has converged [5]. Lean was created by Leonardo de Moura at Microsoft Research in 2013, and is best described in his own words:

"Lean is both a programming language and a theorem prover. It lets you write software and mathematically prove that it's correct - all in one system. This combination is what makes it uniquely powerful for AI, mathematics, and software verification."

— Leonardo de Moura

A major landmark came with the Liquid Tensor Experiment: in December 2020, Fields Medalist Peter Scholze challenged the community to verify a foundational result in condensed mathematics. A team led by Johan Commelin completed the formalization in Lean by July 2022, demonstrating that cutting-edge research could be mechanically verified [6].

Mathematicians including Scholze, Kevin Buzzard, and Terence Tao subsequently advocated for broader adoption [8], further consolidating Lean's Mathlib - a collaborative library of formally verified mathematics. As of early 2026, Mathlib contains over two million lines of verified code from more than 600 contributors [7, 9].


Informal ProofTheorem 3.1.Let G be a finite groupand H a subgroup of G.Then |H| divides |G|.Proof. Consider theleft cosets of H in G...QEDformalizeLean 4 Codetheorem lagrange(G : Type*)[Group G] [Fintype G](H : Subgroup G)[Fintype H] :card H ∣ card G :=byexact Subgroup.card_dvd_card HverifyCompiler OutputProof VerifiedNo axiom errors.Kernel check passed.

From blackboard to proof certificate — how informal mathematics becomes machine-verified.

The Autonomous Age

The ability to verify mathematical proofs mechanically is, on its own, a powerful tool for researchers. But its significance has grown enormously with the advent of AI-powered code generation.

AI systems can now be trained to write proofs directly in Lean. Orchestrated as wider agentic systems - with access to Mathlib tactics and compiler feedback - they can carry out multi-round proof attempts, iteratively learning from successes and failures until a proof is verified as correct.

Performance has advanced rapidly: from DeepMind's AlphaProof achieving IMO silver in 2024 [12, 13], to Harmonic's Aristotle achieving formally verified gold in 2025 [14]. More recently the frontier has shifted to the Putnam - widely considered the hardest mathematics competition in the world - where multiple labs, including Harmonic [14], Seed-Prover [15], Numina [16], and Cajal [17], have saturated the challenge with formally verified solutions.

This acceleration has been catalysed by an emphasis on open-sourcing. A culture of open contributions to Mathlib, public discussion on platforms such as Zulip, and initiatives like XTX's $10M AI Mathematical Olympiad Prize (AIMO) have all played a role [10, 11].


From Autoformalization to Discovery

Progress did not stop at competition benchmarks. Since October 2025, AI tools have helped resolve approximately 100 open Erdős problems, primarily through enhanced literature search and, in several cases, by constructing original proofs - with the first fully autonomous solution to an Erdős problem (#728) formally verified in Lean [18].

In February 2026, the First Proof project released ten previously unpublished research-level problems as a test of AI capability beyond well-known benchmarks. The best reported result was five out of ten, achieved by OpenAI's most advanced internal system [19, 20]. This confirmed that a meaningful gap still remains between competition mathematics and frontier research - though that gap continues to narrow.

Patrick Shafto, DARPA program manager for the Exponentiating Mathematics program, put this trajectory in useful terms, describing a ladder of ascending difficulty [21]:

Machine LearningIMO / PutnamErdősFirst ProofFields MedalMillennium PrizeScience?
saturatedcurrent frontierprojected

It is clear the trajectory of AI-powered mathematics is real. But what is especially striking is how little of the mathematical literature it currently rests on. A quick estimation makes the point:

Let MM be the corpus of modern mathematical literature, proxied by zbMATH Open's ~5 million indexed entries [22]. Let μ\mu be page measure on MM, and let FLeanF_{Lean} be the mathematical content currently represented in Lean's Mathlib and related libraries. Using recent document-scale formalization data to map Lean declarations to informal source pages [24], the current Lean corpus corresponds to about 4.5×1044.5 \times 10^4 informal pages. If MM contains about 5×1075 \times 10^7 to 10810^8 pages, then:

μ(FLean)/μ(M)4.5×104 to 9×104\mu(F_{Lean}) / \mu(M) \approx 4.5 \times 10^{-4} \text{ to } 9 \times 10^{-4}

That is, only about 0.05% of mathematics has been formalized - with uncertainty of at least a factor of two. Everything described in this article has been achieved on that base alone [23].

At Cajal, our goal is to bring that 0.05% to full coverage. Every theorem formalized in Lean will become a building block for automated proof search, and eventually a component of shared infrastructure on which the entire scientific literature can build.

It's an exciting time to be alive.


References

[1] G. Galilei, Il Saggiatore (Rome, 1623).

[2] J. Stillwell, Mathematics and Its History, 3rd ed. (Springer, 2010).

[3] R. A. De Millo, R. J. Lipton, and A. J. Perlis, "Social Processes and Proofs of Theorems and Programs," Communications of the ACM 22, no. 5 (1979): 271–280.

[4] T. C. Hales, "A Proof of the Kepler Conjecture," Annals of Mathematics 162, no. 3 (2005): 1065–1185. See also: T. C. Hales et al., "A Formal Proof of the Kepler Conjecture," Forum of Mathematics, Pi 5 (2017): e2.

[5] L. de Moura et al., "The Lean Theorem Prover (System Description)," in Automated Deduction - CADE-25, LNCS vol. 9195 (Springer, 2015). lean-lang.org.

[6] J. Commelin et al., "Completion of the Liquid Tensor Experiment," Lean Community Blog, July 15, 2022. Challenge posed by P. Scholze, December 2020.

[7] A. Baanen et al., "Growing Mathlib," arXiv:2508.21593 (2025). See also: Mathlib Initiative Roadmap, mathlib-initiative.org.

[8] T. Tao, "A Lean Companion to Analysis I," blog post, May 31, 2025. terrytao.wordpress.com. Formalization completed July 2025.

[9] Lean Focused Research Organization. lean-fro.org.

[10] AIMO Prize. aimoprize.com. $10M challenge fund launched by XTX Markets, November 2023.

[11] AIMO Prize, "Progress Prize: July 2024 Results." Team Numina won $131,072.

[12] Google DeepMind, "AI Achieves Silver-Medal Standard Solving International Mathematical Olympiad Problems," July 25, 2024. deepmind.google.

[13] T. Hubert et al., "Olympiad-Level Formal Mathematical Reasoning with Reinforcement Learning," Nature (2025). doi.org/10.1038/s41586-025-09833-y.

[14] Harmonic, "Harmonic Announces IMO Gold Medal-Level Performance," Business Wire, July 28, 2025. Formally verified in Lean 4.

[15] J. Chen et al., "Seed-Prover 1.5," arXiv:2512.17260 (2025). 11/12 Putnam 2025 within 9 hours.

[16] Project Numina, "Numina-Lean-Agent," arXiv:2601.14027 (2026). 12/12 Putnam 2025 using Claude Opus 4.5.

[17] Cajal Technologies, "Putnam 2025 - Formally Verified Solutions in Lean 4," github.com/cajal-technologies/putnam2025.

[18] "AI Uncovers Solutions to Erdős Problems," Scientific American, February 2026.

[19] M. Abouzaid et al., "First Proof," arXiv:2602.05192 (2026). See also: 1stproof.org.

[20] "First Proof Is AI's Toughest Math Test Yet," Scientific American, February 2026. See also: "When You Do the Math, Humans Still Rule," Harvard Gazette, February 7, 2026.

[21] P. Shafto, "AI + Formal Math," presentation, DARPA expMath program. youtube.com/live/ZMO3xy1bayE.

[22] E. Schlitzer, "zbMATH Open: New Features and Achievements in 2024," European Mathematical Society, January 2025. ~5 million bibliographic entries. zbmath.org/about/.

[23] Lean Community, "Mathlib Statistics," leanprover-community.github.io/mathlib_stats.html.

[24] Z. Wang et al., "M2F: Automated Formalization of Mathematical Literature at Scale," arXiv:2602.17016 (2026). 4,116 declarations from 479 pages.

Share this article

Ready to learn more?

Get in touch to discuss how we can help with your formal verification needs.

Book a Demo