Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams
Project Overview
This document explores the application of generative AI, particularly the GPT-4o model, in the context of grading handwritten solutions in college-level math examinations. It underscores the promising potential of AI to automate grading processes, which could enhance efficiency in educational assessments. However, the study reveals notable challenges, primarily concerning the AI's limitations in accurately interpreting diverse handwriting styles and comprehending the underlying reasoning of students. While enhancements such as the inclusion of grading rubrics and exemplary answers can bolster the model's grading precision, it still falls short of the performance levels achieved by human graders. This indicates a pressing need for further research and development to improve the reliability and effectiveness of AI in educational grading systems. Overall, the findings suggest that while generative AI holds transformative potential in education, especially in automating tasks, significant advancements are necessary to address current shortcomings and better support educators and students alike.
Key Applications
Grading handwritten solutions using GPT-4o
Context: College-level math exams, specifically in a probability theory course, targeting university students.
Implementation: The model was prompted with scanned images of student responses, along with correct answers and rubrics.
Outcomes: Improved grading alignment when context is provided, but accuracy still lower than human graders, with significant room for improvement.
Challenges: Difficulty in accurately grading due to issues with handwriting recognition, understanding correct solution steps, and interpreting reasoning.
Implementation Barriers
Technical Barrier
The model struggles with accurately reading student handwriting and comprehending the reasoning behind solutions.
Proposed Solutions: Propose using more interpretable rubrics, providing full handwritten correct solutions as reference, and assessing model performance on sub-tasks.
Project Team
Adriana Caraeni
Researcher
Alexander Scarlatos
Researcher
Andrew Lan
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Adriana Caraeni, Alexander Scarlatos, Andrew Lan
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai