AI-enhanced Auto-correction of Programming Exercises: How Effective is GPT-3.5?
Project Overview
The document investigates the role of generative AI, specifically GPT-3.5, in enhancing programming education through personalized feedback. It addresses the challenges faced by educators in delivering timely and effective feedback to students in large classes and assesses the potential of AI in evaluating student code submissions. The findings indicate that GPT-3.5 successfully classified submissions with a 73% accuracy rate and provided effective feedback in 59% of instances, demonstrating strengths in personalizing responses and localizing errors. Nonetheless, the study also uncovers limitations, such as inaccuracies in error identification and adherence to specific assignment criteria. Overall, the document highlights both the promise and challenges of integrating generative AI into educational settings, emphasizing its capacity to improve feedback mechanisms while acknowledging areas needing further refinement.
Key Applications
AI-enhanced auto-correction of programming exercises
Context: Higher education, specifically an introductory programming course with approximately 900 students
Implementation: Submissions from students were processed using GPT-3.5 to provide feedback on programming assignments.
Outcomes: 73% accuracy in identifying correct vs. incorrect submissions; 59% effectiveness in generating high-quality feedback.
Challenges: Inaccuracies in error localization, failure to comply with assignment specifics, and occasional hallucination of errors.
Implementation Barriers
Technical Barrier
GPT-3.5 sometimes fails to identify the correct errors in student submissions, leading to inaccurate feedback. This can result in confusion for students.
Proposed Solutions: Potential improvements in prompt engineering; combining GPT-3.5 with traditional e-assessment systems.
Pedagogical Barrier
Inaccurate or misleading feedback could confuse students, especially novices. This necessitates a careful approach in using AI-generated feedback.
Proposed Solutions: Use GPT-3.5 feedback as a draft for human review, ensuring accuracy before presenting to students.
Project Team
Imen Azaiz
Researcher
Oliver Deckarm
Researcher
Sven Strickroth
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Imen Azaiz, Oliver Deckarm, Sven Strickroth
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai