Skip to main content Skip to navigation

Performance Comparison of Large Language Models on Advanced Calculus Problems

Project Overview

The document explores the application of generative AI, specifically Large Language Models (LLMs), in the field of education, with a focus on their effectiveness in solving advanced calculus problems. It evaluates seven models, including ChatGPT 4o and Mistral AI, assessing their performance in terms of accuracy and reliability. The research reveals that these models excel at simpler tasks, such as vector calculations and differentiation, but face significant challenges with more complex scenarios like integral evaluations and optimization problems. These findings underscore the necessity of iterative prompting to address initial inaccuracies, thereby highlighting the potential of LLMs as educational tools in mathematics. The study ultimately suggests that while there are limitations to be overcome, the integration of generative AI in educational settings can enhance learning experiences and provide valuable support for students grappling with challenging mathematical concepts.

Key Applications

Evaluation of LLMs in solving advanced calculus problems

Context: Educational context for mathematics, targeting educators and students

Implementation: Performance evaluation of seven LLMs on a set of calculus problems with scoring based on correctness and explanation

Outcomes: Insights into LLM capabilities and limitations; identified strengths in simpler problems and weaknesses in complex evaluations

Challenges: Models struggled with complex integral evaluations and optimization problems

Implementation Barriers

Technical

Models struggled with complex integral evaluations and optimization problems, leading to persistent failures in specific problem types, which indicates weaknesses in their algorithms and affects their reliability in advanced mathematics.

Proposed Solutions: Focus on refining algorithms for integral calculus, optimization techniques in model training, and targeted improvements in model training and validation processes.

Project Team

In Hak Moon

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: In Hak Moon

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies