Skip to main content Skip to navigation

AI-assisted Automated Short Answer Grading of Handwritten University Level Mathematics Exams

Project Overview

The document explores the use of generative AI, particularly GPT-4, in the realm of education, focusing on its application for automating the grading of handwritten mathematical assessments at the university level. It identifies the difficulties associated with grading short-answer questions, which often yield a wide range of acceptable answers, necessitating effective feedback mechanisms that traditional grading methods struggle to provide. The study illustrates that GPT-4 can reliably perform initial grading for semi-open-ended responses, though it underscores the critical need for human verification of the AI's assessments. Additionally, it discusses the importance of developing confidence measures to enhance the trustworthiness of the AI's grading results. Overall, the findings suggest that while generative AI has the potential to streamline grading processes in educational settings, its implementation should be approached with caution and supplemented by human oversight to ensure accuracy and effectiveness.

Key Applications

AI-assisted Automated Short Answer Grading (ASAG) of handwritten university-level mathematics exams using GPT-4.

Context: University-level mathematics education, targeting undergraduate students in high-enrollment courses.

Implementation: The study used a mock exam with handwritten responses, applying GPT-4 for initial grading, followed by human verification.

Outcomes: GPT-4 provided reliable and cost-effective initial grading, with a reported accuracy of grading comparable to human graders in many instances.

Challenges: Challenges included the variability of answers, the need for human verification due to false positives, and limitations in handwritten expression recognition.

Implementation Barriers

Technical Limitations

Recognizing handwritten mathematical expressions is challenging due to ambiguities in handwriting and the dependency on contextual information.

Proposed Solutions: Future research should focus on improving Optical Character Recognition (OCR) and refining grading rules based on student responses.

Trust and Reliability

There is a significant false-positive rate in grading, where the AI may incorrectly confirm a grade, potentially leading to detrimental outcomes.

Proposed Solutions: Developing reliable confidence measures for AI grading results to ensure that only low-confidence results are referred to human graders.

Project Team

Tianyi Liu

Researcher

Julia Chatain

Researcher

Laura Kobel-Keller

Researcher

Gerd Kortemeyer

Researcher

Thomas Willwacher

Researcher

Mrinmaya Sachan

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Tianyi Liu, Julia Chatain, Laura Kobel-Keller, Gerd Kortemeyer, Thomas Willwacher, Mrinmaya Sachan

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies