Skip to main content Skip to navigation

Improving the Validity of Automatically Generated Feedback via Reinforcement Learning

Project Overview

The document presents a framework for utilizing generative AI, specifically large language models (LLMs), to automate feedback generation and evaluation in math education. It highlights the significance of providing pedagogically valid feedback that not only addresses student misconceptions but also aligns with educational objectives. The framework incorporates reinforcement learning techniques to enhance the quality of the feedback generated, ensuring it meets specific educational standards. By employing a tailored rubric for evaluating the feedback, the system aims to deliver accurate, constructive responses that support student learning. Overall, the findings suggest that generative AI can effectively contribute to personalized education by offering timely and relevant feedback, thereby improving student understanding and engagement in mathematical concepts.

Key Applications

Feedback Generation via Large Language Models

Context: Middle school-level math education focusing on multiple-choice questions and feedback for incorrect answers.

Implementation: The framework employs reinforcement learning and large language models (LLMs), particularly Llama 2 and GPT-4, to generate and evaluate feedback.

Outcomes: Significant improvements in the correctness and alignment of generated feedback compared to traditional methods.

Challenges: Ensuring correctness and pedagogical alignment of feedback, reliance on human evaluations for accuracy, and the need for robust evaluation metrics.

Implementation Barriers

Technical

The complexity of accurately evaluating feedback messages, particularly in mathematical contexts where understanding student errors is critical.

Proposed Solutions: Developing automated evaluation rubrics and using reinforcement learning for training feedback generation models.

Human Resource

The need for human annotators to evaluate feedback, which is labor-intensive and costly.

Proposed Solutions: Using LLMs like GPT-4 to perform evaluations, thereby reducing the dependency on human annotators.

Project Team

Alexander Scarlatos

Researcher

Digory Smith

Researcher

Simon Woodhead

Researcher

Andrew Lan

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Alexander Scarlatos, Digory Smith, Simon Woodhead, Andrew Lan

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies