Products.

End-to-end tools for training and evaluating AI systems on mathematical reasoning. Datasets, benchmarks, and RL environments built for production use.

Dataset Curation

Training data that compiles.

Autoformalization translates informal mathematical statements (natural language, LaTeX) into formally verifiable code. Our datasets provide high-quality (natural language, formal statement) pairs across multiple proof assistants. Each entry is verified through a rigorous pipeline: LLM-generated translations are type-checked by the proof assistant's kernel, then reviewed by domain experts to ensure semantic correctness—that the formal statement accurately captures the intended mathematical meaning.

→

Multi-Prover Support

Lean 4 (with Mathlib), Coq, and Isabelle/HOL

→

Verified by Compilation

Every statement type-checks—no syntax errors, no type mismatches

→

Expert Semantic Review

PhD mathematicians verify that formalizations capture the correct meaning

→

Diverse Mathematics

Algebra, analysis, topology, number theory, combinatorics

→

Synthetic Augmentation

Procedurally generated theorems with controlled difficulty

Autoformalization Example

Natural Language

"A family of sets forms a delta-system if there exists a common kernel set K such that for any two distinct sets A, B in the family, their intersection equals K."

Lean 4

def IsDeltaSystem {α : Type*} [DecidableEq α]
  (F : Finset (Finset α)) : Prop :=
    ∃ K : Finset α, ∀ A ∈ F, ∀ B ∈ F, A ≠ B → A ∩ B = K

Type-checked and verified

Ready to build with verified data?

Partner with Cajal to access rigorous formal reasoning infrastructure for your AI systems.

Book a Demo Contact Us