A Lean Dataset for International Math Olympiad: Small Steps towards Writing Math Proofs for Hard Problems
Project Overview
The document explores the application of generative AI, particularly through the Lean system, to enhance education in mathematics by automating formal proof writing for complex problems, such as those from the International Math Olympiad (IMO). The authors have compiled a specialized dataset containing formal proofs for 17 IMO problems and developed 1,329 lemmas to assess the performance of AI models. The findings underscore both the potential and the constraints of advanced language models in the realm of mathematical theorem proving, revealing that the effectiveness of these AI systems is significantly influenced by the quality of the training data. This work illustrates a promising intersection of AI and education, highlighting how generative AI can support and transform the learning and teaching of formal mathematics while also identifying areas for further improvement and research.
Key Applications
Evaluation and generation of formal proofs
Context: The educational contexts include mathematics education for advanced high school students preparing for the International Mathematical Olympiad (IMO) and the assessment of large language models (LLMs) in generating formal proofs for mathematical lemmas.
Implementation: Formal proofs were generated using the Lean programming language and evaluated using various LLMs (e.g., o3-mini, Goedel-Prover). The LLMs were prompted to generate proofs for mathematical lemmas, and their accuracy was assessed against established benchmarks, with a dataset of lemmas constructed from existing IMO problems.
Outcomes: The combined dataset enhances understanding of formal proofs in mathematics and serves as a benchmark for evaluating AI models in theorem proving. Findings indicate that while some models can generate proofs, they struggle with longer and more complex proofs, which highlights areas for improvement.
Challenges: Generating formal proofs remains a challenging task for AI models due to the complexity and novelty of the problems. LLMs often produce incorrect or incomplete proofs, and their performance varies significantly based on proof length and complexity, necessitating high-quality training data.
Implementation Barriers
Data Quality
The lack of high-quality training data for automated theorem proving limits the effectiveness of AI models.
Proposed Solutions: Creating and sharing datasets of formal proofs and lemmas, as well as utilizing existing libraries like Mathlib to improve model training.
Model Limitations
Current AI models struggle with planning and executing complex proofs due to their reliance on short proof patterns.
Proposed Solutions: Developing models that can understand and generate longer proof chains through improved training strategies and feedback mechanisms.
Project Team
Roozbeh Yousefzadeh
Researcher
Xuenan Cao
Researcher
Azim Ospanov
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Roozbeh Yousefzadeh, Xuenan Cao, Azim Ospanov
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai