Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks
Project Overview
The document explores the role of generative AI, particularly large language models (LLMs), in improving educational outcomes, with a specific focus on mathematical problem-solving. It introduces 'MathQuest', a specialized dataset aimed at fine-tuning LLMs to enhance their ability to tackle complex mathematical challenges. The findings emphasize how AI can provide personalized learning experiences, allowing for tailored educational content that meets individual student needs. While the effectiveness of LLMs in generating educational materials is underscored, the document also acknowledges the challenges these models encounter in accurately addressing intricate mathematical tasks. Overall, the integration of generative AI in education showcases promising advancements in personalized learning and content generation, while also highlighting the need for ongoing improvements to ensure reliability in more complex areas of study.
Key Applications
MathQuest dataset for fine-tuning LLMs
Context: High school mathematics education, targeting students and educators
Implementation: The dataset was curated from NCERT textbooks and used to fine-tune models like MAmmoTH, LLaMA-2, and WizardMath.
Outcomes: MAmmoTH-13B exhibited the best performance, improving the models' accuracy in solving mathematical problems.
Challenges: LLMs struggle with complex expressions and may require additional training data to enhance reasoning capabilities.
Implementation Barriers
Technical & Data Limitations
LLMs face challenges with complex mathematical problems that involve intricate reasoning and multi-step calculations. Existing datasets do not adequately cover the complexity of real-world mathematical problems.
Proposed Solutions: Future work aims to increase the training dataset size and incorporate recent prompting techniques to improve LLMs' reasoning abilities. The introduction of the MathQuest dataset aims to address this gap by providing a diverse range of mathematical challenges.
Project Team
Avinash Anand
Researcher
Mohit Gupta
Researcher
Kritarth Prasad
Researcher
Navya Singla
Researcher
Sanjana Sanjeev
Researcher
Jatin Kumar
Researcher
Adarsh Raj Shivam
Researcher
Rajiv Ratn Shah
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Avinash Anand, Mohit Gupta, Kritarth Prasad, Navya Singla, Sanjana Sanjeev, Jatin Kumar, Adarsh Raj Shivam, Rajiv Ratn Shah
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai