Products.

End-to-end tools for training and evaluating AI systems on mathematical reasoning. Datasets, benchmarks, and RL environments built for production use.

01

Dataset Curation

Training data that compiles.

Autoformalization translates informal mathematical statements (natural language, LaTeX) into formally verifiable code. Our datasets provide high-quality (natural language, formal statement) pairs across multiple proof assistants. Each entry is verified through a rigorous pipeline: LLM-generated translations are type-checked by the proof assistant's kernel, then reviewed by domain experts to ensure semantic correctness—that the formal statement accurately captures the intended mathematical meaning.
Multi-Prover Support
Lean 4 (with Mathlib), Coq, and Isabelle/HOL
Verified by Compilation
Every statement type-checks—no syntax errors, no type mismatches
Expert Semantic Review
PhD mathematicians verify that formalizations capture the correct meaning
Diverse Mathematics
Algebra, analysis, topology, number theory, combinatorics
Synthetic Augmentation
Procedurally generated theorems with controlled difficulty
Autoformalization Example
Natural Language
"A family of sets forms a delta-system if there exists a common kernel set K such that for any two distinct sets A, B in the family, their intersection equals K."
Lean 4
def IsDeltaSystem {α : Type*} [DecidableEq α]
  (F : Finset (Finset α)) : Prop :=
    ∃ K : Finset α, ∀ A ∈ F, ∀ B ∈ F, A ≠ B → A ∩ B = K
Type-checked and verified

Ready to build with verified data?

Partner with Cajal to access rigorous formal reasoning infrastructure for your AI systems.