Services

In line with our mission to accelerate scientific discovery and ground models in truth, we work with AI labs, government agencies, and research institutes to improve their models at formal verification and mathematical reasoning.

Dataset Curation

Training data that compiles.

Autoformalization translates informal mathematical statements (natural language, LaTeX) into formally verifiable code. Our datasets provide high-quality (natural language, formal statement) pairs across multiple proof assistants. Each entry is verified through a rigorous pipeline: LLM-generated translations are type-checked by the proof assistant's kernel, then reviewed by domain experts to ensure semantic correctness -that the formal statement accurately captures the intended mathematical meaning.
Multi-Prover Support
Lean 4 (with Mathlib), Coq, and Isabelle/HOL
Verified by Compilation
Every statement type-checks -no syntax errors, no type mismatches
Expert Semantic Review
PhD mathematicians verify that formalizations capture the correct meaning
Diverse Mathematics
Algebra, analysis, topology, number theory, combinatorics
Synthetic Augmentation
Procedurally generated theorems with controlled difficulty
Autoformalization Example
Natural Language
"A family of sets forms a delta-system if there exists a common kernel set K such that for any two distinct sets A, B in the family, their intersection equals K."
Lean 4
def IsDeltaSystem {α : Type*} [DecidableEq α]
  (F : Finset (Finset α)) : Prop :=
    ∃ K : Finset α, ∀ A ∈ F, ∀ B ∈ F, A ≠ B → A ∩ B = K
Type-checked and verified

Ready to build with verified data?

Partner with Cajal to access rigorous formal reasoning infrastructure for your AI systems.