Skip to main content Skip to navigation

Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks

Project Overview

The document explores the role of generative AI, particularly large language models (LLMs), in improving educational outcomes, with a specific focus on mathematical problem-solving. It introduces 'MathQuest', a specialized dataset aimed at fine-tuning LLMs to enhance their ability to tackle complex mathematical challenges. The findings emphasize how AI can provide personalized learning experiences, allowing for tailored educational content that meets individual student needs. While the effectiveness of LLMs in generating educational materials is underscored, the document also acknowledges the challenges these models encounter in accurately addressing intricate mathematical tasks. Overall, the integration of generative AI in education showcases promising advancements in personalized learning and content generation, while also highlighting the need for ongoing improvements to ensure reliability in more complex areas of study.

Key Applications

MathQuest dataset for fine-tuning LLMs

Context: High school mathematics education, targeting students and educators

Implementation: The dataset was curated from NCERT textbooks and used to fine-tune models like MAmmoTH, LLaMA-2, and WizardMath.

Outcomes: MAmmoTH-13B exhibited the best performance, improving the models' accuracy in solving mathematical problems.

Challenges: LLMs struggle with complex expressions and may require additional training data to enhance reasoning capabilities.

Implementation Barriers

Technical & Data Limitations

LLMs face challenges with complex mathematical problems that involve intricate reasoning and multi-step calculations. Existing datasets do not adequately cover the complexity of real-world mathematical problems.

Proposed Solutions: Future work aims to increase the training dataset size and incorporate recent prompting techniques to improve LLMs' reasoning abilities. The introduction of the MathQuest dataset aims to address this gap by providing a diverse range of mathematical challenges.

Project Team

Avinash Anand

Researcher

Mohit Gupta

Researcher

Kritarth Prasad

Researcher

Navya Singla

Researcher

Sanjana Sanjeev

Researcher

Jatin Kumar

Researcher

Adarsh Raj Shivam

Researcher

Rajiv Ratn Shah

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Avinash Anand, Mohit Gupta, Kritarth Prasad, Navya Singla, Sanjana Sanjeev, Jatin Kumar, Adarsh Raj Shivam, Rajiv Ratn Shah

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies