Benchmarking Large Language Models for Calculus Problem-Solving: A Comparative Analysis

Project Overview

The document assesses the capabilities of five prominent large language models (LLMs) in addressing calculus differentiation problems, uncovering marked variations in their performance. While all models perform well on basic differentiation tasks, they falter when faced with more complex problems that demand deeper conceptual understanding and algebraic manipulation. Among the models evaluated, Chat GPT 4o emerged as the most proficient, whereas Meta AI showed significant shortcomings, particularly with word problems. These findings underscore the critical implications for the integration of generative AI in education, particularly in mathematics, highlighting both the potential benefits and the necessity for human oversight. The results advocate for a balanced approach to utilizing AI tools in teaching calculus, suggesting that while these models can enhance learning, they must be complemented by traditional educational methods to ensure comprehensive understanding and skill development.

Key Applications

Large Language Models for solving calculus problems

Context: Higher education, specifically university-level calculus courses

Implementation: The models were tested on a set of predefined calculus differentiation problems to evaluate their performance.

Outcomes: Chat GPT 4o achieved a success rate of 94.71%, while Meta AI struggled significantly with a 56.75% success rate. Overall performance varied based on the complexity of the problems.

Challenges: The main challenges included limited conceptual understanding, especially in interpreting derivatives and solving word problems, as well as issues with algebraic manipulation.

Implementation Barriers

Conceptual Understanding

Models struggle to connect symbolic derivatives with their geometric and functional meanings, particularly in complex problems.

Proposed Solutions: Emphasizing human oversight in teaching to bridge the gap in conceptual understanding and encouraging iterative prompting for clarity.

Algebraic Manipulation

Frequent errors in algebraic simplification and manipulation impacted overall performance, particularly for Meta AI.

Proposed Solutions: Integrating systems that can verify algebraic manipulations and provide feedback to students on their process.

Problem Generation Limitations

Models faced difficulties generating valid problems with specific properties, particularly advanced calculus problems.

Proposed Solutions: Developing targeted training for models focused on enhancing inverse reasoning capabilities for problem generation.

Project Team

In Hak Moon

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: In Hak Moon

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects