A Lean Dataset for International Math Olympiad: Small Steps towards Writing Math Proofs for Hard Problems

Project Overview

The document explores the application of generative AI, particularly through the Lean system, to enhance education in mathematics by automating formal proof writing for complex problems, such as those from the International Math Olympiad (IMO). The authors have compiled a specialized dataset containing formal proofs for 17 IMO problems and developed 1,329 lemmas to assess the performance of AI models. The findings underscore both the potential and the constraints of advanced language models in the realm of mathematical theorem proving, revealing that the effectiveness of these AI systems is significantly influenced by the quality of the training data. This work illustrates a promising intersection of AI and education, highlighting how generative AI can support and transform the learning and teaching of formal mathematics while also identifying areas for further improvement and research.

Key Applications

Evaluation and generation of formal proofs

Context: The educational contexts include mathematics education for advanced high school students preparing for the International Mathematical Olympiad (IMO) and the assessment of large language models (LLMs) in generating formal proofs for mathematical lemmas.

Implementation: Formal proofs were generated using the Lean programming language and evaluated using various LLMs (e.g., o3-mini, Goedel-Prover). The LLMs were prompted to generate proofs for mathematical lemmas, and their accuracy was assessed against established benchmarks, with a dataset of lemmas constructed from existing IMO problems.

Outcomes: The combined dataset enhances understanding of formal proofs in mathematics and serves as a benchmark for evaluating AI models in theorem proving. Findings indicate that while some models can generate proofs, they struggle with longer and more complex proofs, which highlights areas for improvement.

Challenges: Generating formal proofs remains a challenging task for AI models due to the complexity and novelty of the problems. LLMs often produce incorrect or incomplete proofs, and their performance varies significantly based on proof length and complexity, necessitating high-quality training data.

Implementation Barriers

Data Quality

The lack of high-quality training data for automated theorem proving limits the effectiveness of AI models.

Proposed Solutions: Creating and sharing datasets of formal proofs and lemmas, as well as utilizing existing libraries like Mathlib to improve model training.

Model Limitations

Current AI models struggle with planning and executing complex proofs due to their reliance on short proof patterns.

Proposed Solutions: Developing models that can understand and generate longer proof chains through improved training strategies and feedback mechanisms.

Project Team

Roozbeh Yousefzadeh

Researcher

Xuenan Cao

Researcher

Azim Ospanov

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Roozbeh Yousefzadeh, Xuenan Cao, Azim Ospanov

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects