Skip to main content Skip to navigation

Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams

Project Overview

This document explores the application of generative AI, particularly the GPT-4o model, in the context of grading handwritten solutions in college-level math examinations. It underscores the promising potential of AI to automate grading processes, which could enhance efficiency in educational assessments. However, the study reveals notable challenges, primarily concerning the AI's limitations in accurately interpreting diverse handwriting styles and comprehending the underlying reasoning of students. While enhancements such as the inclusion of grading rubrics and exemplary answers can bolster the model's grading precision, it still falls short of the performance levels achieved by human graders. This indicates a pressing need for further research and development to improve the reliability and effectiveness of AI in educational grading systems. Overall, the findings suggest that while generative AI holds transformative potential in education, especially in automating tasks, significant advancements are necessary to address current shortcomings and better support educators and students alike.

Key Applications

Grading handwritten solutions using GPT-4o

Context: College-level math exams, specifically in a probability theory course, targeting university students.

Implementation: The model was prompted with scanned images of student responses, along with correct answers and rubrics.

Outcomes: Improved grading alignment when context is provided, but accuracy still lower than human graders, with significant room for improvement.

Challenges: Difficulty in accurately grading due to issues with handwriting recognition, understanding correct solution steps, and interpreting reasoning.

Implementation Barriers

Technical Barrier

The model struggles with accurately reading student handwriting and comprehending the reasoning behind solutions.

Proposed Solutions: Propose using more interpretable rubrics, providing full handwritten correct solutions as reference, and assessing model performance on sub-tasks.

Project Team

Adriana Caraeni

Researcher

Alexander Scarlatos

Researcher

Andrew Lan

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Adriana Caraeni, Alexander Scarlatos, Andrew Lan

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies