
Researchers David Roe ’06 and Andrew Sutherland ’90, PhD ’07 are among the first recipients of Mathematics Grant Renaissance Charitable and AI at XTX Markets.
Four MIT alumni – Anshula Gandhi ’19, Viktor Kunchak SM ’01, PhD ’07; Gireeja Ranade ’07; also recognized for Damiano Testa ’05 – separate projects.
The first 29 winning projects will support mathematicians and researchers from universities and organizations working to develop artificial intelligence systems that will help advance mathematical discovery and research across several important tasks.
Law and Sutherland, together with Chris Birkbeck of the University of East Anglia, will use grants to enhance the automatic theorem by building a connection between L-functions and the Modular Forms Database (LMFDB) and the LEAN4 Mathematics Library (Mathlib).
“Automized theorem formulas are very technically involved, but their development is lacking in resources,” Sutherland says. With AI technologies such as large-scale language models (LLM), the input barriers for these formal tools are rapidly being removed, making them accessible to mathematicians working for formal verification frameworks.
Mathlib is a large community-driven mathematics library for Rean Theorem Prover, and is a formal system that verifies the accuracy of every step of the evidence. Mathlib is currently included in 10 orders5 Mathematical results (lemma, proposition, theorem, etc.). LMFDB includes over 10 large, collaborative online resources that serve as a kind of “encyclopedia” of modern numerical theory.9 Specific statements. Sutherland and Roe manage the editors of LMFDB.
The Roe and Sutherland grants will be used in projects aimed at enhancing both systems, and will now be available within Mathlib as an assertion that has not yet been formally proven, providing an accurate and formal definition of numerical data stored within the LMFDB. This bridge benefits both human mathematicians and AI agents, providing a framework for connecting other mathematical databases to formal theorem provisioning systems.
The main obstacles to automating mathematical discovery and evidence are the limited amount of formalized mathematical knowledge, the high cost of formalizing complex results, and the gap between what is computable and what is feasible formalizing.
To address these obstacles, researchers will use funds to build tools for accessing LMFDB from Mathlib and create a large database of nonforming mathematical knowledge that can access formal proof systems. This approach allows the Proof Assistant to identify specific targets for formalization without the need to formalize the entire LMFDB corpus in advance.
“When we create a large database of informalized, mathematical theoretical facts available within Mathlib, it provides a powerful method of mathematical discovery, as the set of facts an agent wants to consider when searching for a theorem or proof is exponentially larger than the facts that need to be formalized by ultimately proving the theory,” says Roe.
Researchers point out that proving new theorems in the frontier of mathematical knowledge often involves procedures that rely on non-trivial calculations. For example, the evidence for Andrew Wiles’ Fermat’s final theorem uses what is known as the “3-5 trick” at the key points of the evidence.
“This trick relies on the fact that modular curve X_0(15) has only finite rational points, and none of these rational points correspond to semi-stable elliptic curves,” according to Sutherland. “This fact is known long before Wiles’s work and can be easily verified using calculators available in modern computer algebra systems, but it cannot be realistically proven using pencils and paper, nor can it be easily formalized.”
For more efficient verification, formal theorem probers are connected to computer algebraic systems, but there are several other advantages to leveraging the computational output of existing mathematical databases.
Using saved results will save you the money you need to redo these calculations, taking advantage of thousands of CPU years of calculation time already spent creating LMFDBs. Making pre-computed information available will also allow you to search for examples and counterexamples without knowing in advance how extensive your search will be. Furthermore, mathematical databases are curated repositories rather than simply a random collection of facts.
“The fact that numbers theorists highlighted the role of conductors in databases of elliptic curves has already proven important for one notable mathematical discovery made using machine learning tools: tweets,” says Sutherland.
“Our next step is to build a team, engage both in the LMFDB and Mathlib communities, formalize the definitions that underpin the LMFDB elliptic curves, numeric fields, and modular format sections of LMFDB, allowing you to perform LMFDB searches from within Mathlib. “If you’re a MIT student who’s interested in being involved, feel free to reach out!”
