Debugging Without Error Messages: How LLM Prompting Strategy Affects Programming Error Explanation Effectiveness

Project Overview

This document examines the role of generative AI, particularly large language models (LLMs) like GPT-3.5, in enhancing educational experiences, specifically in programming error correction. It highlights the common frustration experienced by novice programmers due to traditional error messages, which often lack clarity. By utilizing LLMs, educators can provide more comprehensible explanations for programming errors when these models are given appropriate context. The research evaluates various prompting strategies, including baseline, one-shot, and fine-tuning techniques, to determine their effectiveness in improving the quality of error explanations. The findings indicate that while LLMs are capable of generating valuable feedback, the choice of prompting strategy does not significantly influence the accuracy of the responses; however, it does impact their conciseness. Overall, the study underscores the potential of generative AI in education, particularly in making technical concepts more accessible to learners, thus highlighting its importance as a tool for enhancing understanding in programming.

Key Applications

LLM-generated error message explanations

Context: Educational context focused on programming error explanations for novice programmers using TigerJython, a pedagogical programming language.

Implementation: Utilized various prompting strategies (baseline, one-shot, fine-tuning) with GPT-3.5 to evaluate error explanation effectiveness.

Outcomes: Found that 2-3 useful explanations were generated for every misleading response, and fine-tuning reduced extraneous information.

Challenges: Maintaining a diverse dataset of programming errors for effective fine-tuning.

Implementation Barriers

Data diversity

Programming errors have a long-tail distribution, leading to challenges in finding diverse examples for fine-tuning the model.

Proposed Solutions: Ensure training datasets contain a broad range of programming errors to avoid overfitting on common mistakes.

Cognitive load

Extraneous information in LLM responses can increase cognitive load for students, especially novices.

Proposed Solutions: Focus on training the model to provide concise and relevant explanations to minimize distractions.

Project Team

Audrey Salmon

Researcher

Katie Hammer

Researcher

Eddie Antonio Santos

Researcher

Brett A. Becker

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Audrey Salmon, Katie Hammer, Eddie Antonio Santos, Brett A. Becker

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects