LLM-based Automated Grading with Human-in-the-Loop
Project Overview
This document explores the application of large language models (LLMs) in education, specifically focusing on automated short answer grading (ASAG). It emphasizes the significant improvements in grading performance facilitated by LLMs through a human-in-the-loop (HITL) approach, which incorporates human feedback to refine the grading process. The proposed GradeHITL framework enhances grading accuracy by enabling LLMs to consult human experts regarding grading rubrics, promoting iterative improvements in rubric design. While the integration of LLMs in grading presents notable advantages, such as increased efficiency and accuracy, it also poses challenges, including the need for high-quality question generation and the complexities involved in language interpretation. Overall, the document underscores the potential of generative AI to transform educational assessment while acknowledging the hurdles that must be addressed to optimize its effectiveness.
Key Applications
GradeHITL - Human-in-the-Loop Automatic Short Answer Grading
Context: ASAG tasks, specifically evaluating open-ended textual responses in educational assessments, targeted towards educators and students in mathematics education.
Implementation: GradeHITL integrates LLMs with human expert feedback to refine grading rubrics. It involves components for grading, inquiring (LLMs generating questions for human experts), and optimizing rubrics based on feedback.
Outcomes: Significant improvements in grading accuracy and rubric effectiveness compared to fully automated methods. Enhanced understanding of student responses and reduction in grading variability.
Challenges: Quality of questions generated by LLMs can vary; small changes in input can lead to significant output differences, complicating grading consistency.
Implementation Barriers
Technical Barrier
LLMs struggle with domain-specific jargon and the complexity of language expressions, leading to inaccuracies in grading.
Proposed Solutions: Human-in-the-loop integration where expert feedback is used to refine grading rubrics and enhance LLM understanding.
Quality Control Barrier
The quality of questions generated by LLMs during the inquiry process may not always be high, impacting the grading accuracy.
Proposed Solutions: Implement a reinforcement learning-based question selection method to filter out low-quality questions.
Project Team
Hang Li
Researcher
Yucheng Chu
Researcher
Kaiqi Yang
Researcher
Yasemin Copur-Gencturk
Researcher
Jiliang Tang
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Hang Li, Yucheng Chu, Kaiqi Yang, Yasemin Copur-Gencturk, Jiliang Tang
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai