Skip to main content Skip to navigation

AI-Assisted Generation of Difficult Math Questions

Project Overview

The document explores the innovative application of generative AI, particularly large language models (LLMs), in the educational domain to enhance the quality of mathematics assessments. It emphasizes the development of a novel framework that integrates AI-generated questions with human oversight to produce more challenging and varied mathematical problems, addressing the limitations of existing datasets. A key outcome of this approach is the creation of the MATH2 dataset, which offers more difficult evaluations intended to improve learners' problem-solving skills in areas such as area calculation and algebraic manipulation. The use of advanced AI tools, including Gemini 1.5 Pro and Llama-3-70B-Instruct, facilitates the generation of complex questions that require a profound understanding of mathematical concepts. The findings underscore the potential of generative AI to transform educational assessments by providing diverse and rigorous challenges that better evaluate students' competencies in mathematics.

Key Applications

AI-Assisted Generation of Difficult Math Questions

Context: Educational context targeting high school and college students learning various mathematical concepts, including algebra, geometry, and advanced mathematics. The focus is on enhancing problem-solving abilities through exposure to complex question formats.

Implementation: A pipeline utilizing AI models to generate challenging math questions based on specified skills across different mathematical subdomains. The process integrates human experts for validation and refinement, ensuring clarity and precision in the generated questions.

Outcomes: Creation of a new dataset (MATH2) that includes high-quality, difficult questions, leading to improved understanding of algebraic concepts, geometry applications, and overall problem-solving skills among students.

Challenges: Challenges include ensuring sufficient diversity and complexity in generated questions, the need for human oversight to avoid ambiguity, balancing question difficulty, and the risk of overfitting models to specific datasets.

Implementation Barriers

Technical

LLMs may generate questions that are too similar to existing ones, lack sufficient complexity, or may be ambiguous, leading to confusion among students.

Proposed Solutions: Implementing a human-in-the-loop approach where experts validate and refine AI-generated questions, and thorough review of questions for clarity and precision before use.

Operational

High costs associated with extensive human verification and the use of advanced LLMs.

Proposed Solutions: Future work should focus on optimizing prompting strategies and using open-weight models to improve efficiency.

Quality Assurance

The potential for generated questions to be unsolvable, trivial, or of low quality.

Proposed Solutions: Adding validation checks at multiple stages of the question generation pipeline to filter out low-quality questions.

Computational Tractability

Some questions may be too complex for students to solve without computational aid.

Proposed Solutions: Design questions to be solvable by hand within a limited time frame without requiring advanced tools.

Project Team

Vedant Shah

Researcher

Dingli Yu

Researcher

Kaifeng Lyu

Researcher

Simon Park

Researcher

Jiatong Yu

Researcher

Yinghui He

Researcher

Nan Rosemary Ke

Researcher

Michael Mozer

Researcher

Yoshua Bengio

Researcher

Sanjeev Arora

Researcher

Anirudh Goyal

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Vedant Shah, Dingli Yu, Kaifeng Lyu, Simon Park, Jiatong Yu, Yinghui He, Nan Rosemary Ke, Michael Mozer, Yoshua Bengio, Sanjeev Arora, Anirudh Goyal

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies