Skip to main content Skip to navigation

Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation

Project Overview

The document explores the innovative use of generative AI, specifically large language models (LLMs) such as GPT-4 and GPT-3.5, in the realm of programming education. It highlights a novel approach called GPT4Hints-GPT3.5Val, designed to automate the generation of feedback for students’ buggy code by delivering hints akin to those provided by human tutors. This method aims to offer personalized, high-quality feedback while alleviating some of the workload faced by educators. Evaluation results indicate that the technique achieves precision levels on par with human tutors, alongside sufficient coverage of various issues in code. However, it also acknowledges ongoing challenges regarding the models' overall effectiveness compared to traditional human feedback. The findings suggest a promising path for integrating generative AI into educational frameworks, potentially enhancing student learning experiences while addressing tutor resource constraints.

Key Applications

GPT4Hints-GPT3.5Val for automating programming feedback

Context: Programming education, specifically for students learning Python through assignments

Implementation: Integration of GPT-4 as a tutor model for generating hints and GPT-3.5 as a student model for validating hint quality

Outcomes: Achieves around 95% precision in hint quality, comparable to human tutors, while maintaining over 70% coverage across datasets

Challenges: Quality of hints generated by LLMs is still inferior to human tutors, issues with symbolic reasoning and hallucination in generated content

Implementation Barriers

Quality of output

LLMs struggle with generating hints that match the quality of human tutors, often leading to feedback that is less effective.

Proposed Solutions: Improving prompting strategies and validation mechanisms to enhance the generative quality of hints.

Symbolic reasoning

LLMs have difficulty with symbolic reasoning and understanding program execution, which is crucial for debugging.

Proposed Solutions: Incorporating symbolic information such as failing test cases into the prompting process to enhance reasoning.

Hallucination

Generated feedback may contain inaccuracies that could mislead students.

Proposed Solutions: Implementing quality assurance layers to validate the generated content before it is shared with students.

Project Team

Tung Phung

Researcher

Victor-Alexandru Pădurean

Researcher

Anjali Singh

Researcher

Christopher Brooks

Researcher

José Cambronero

Researcher

Sumit Gulwani

Researcher

Adish Singla

Researcher

Gustavo Soares

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Tung Phung, Victor-Alexandru Pădurean, Anjali Singh, Christopher Brooks, José Cambronero, Sumit Gulwani, Adish Singla, Gustavo Soares

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies