AI-enhanced Auto-correction of Programming Exercises: How Effective is GPT-3.5?

Project Overview

The document investigates the role of generative AI, specifically GPT-3.5, in enhancing programming education through personalized feedback. It addresses the challenges faced by educators in delivering timely and effective feedback to students in large classes and assesses the potential of AI in evaluating student code submissions. The findings indicate that GPT-3.5 successfully classified submissions with a 73% accuracy rate and provided effective feedback in 59% of instances, demonstrating strengths in personalizing responses and localizing errors. Nonetheless, the study also uncovers limitations, such as inaccuracies in error identification and adherence to specific assignment criteria. Overall, the document highlights both the promise and challenges of integrating generative AI into educational settings, emphasizing its capacity to improve feedback mechanisms while acknowledging areas needing further refinement.

Key Applications

AI-enhanced auto-correction of programming exercises

Context: Higher education, specifically an introductory programming course with approximately 900 students

Implementation: Submissions from students were processed using GPT-3.5 to provide feedback on programming assignments.

Outcomes: 73% accuracy in identifying correct vs. incorrect submissions; 59% effectiveness in generating high-quality feedback.

Challenges: Inaccuracies in error localization, failure to comply with assignment specifics, and occasional hallucination of errors.

Implementation Barriers

Technical Barrier

GPT-3.5 sometimes fails to identify the correct errors in student submissions, leading to inaccurate feedback. This can result in confusion for students.

Proposed Solutions: Potential improvements in prompt engineering; combining GPT-3.5 with traditional e-assessment systems.

Pedagogical Barrier

Inaccurate or misleading feedback could confuse students, especially novices. This necessitates a careful approach in using AI-generated feedback.

Proposed Solutions: Use GPT-3.5 feedback as a draft for human review, ensuring accuracy before presenting to students.

Project Team

Imen Azaiz

Researcher

Oliver Deckarm

Researcher

Sven Strickroth

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Imen Azaiz, Oliver Deckarm, Sven Strickroth

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects