Skip to main content Skip to navigation

LLM-based Automated Grading with Human-in-the-Loop

Project Overview

This document explores the application of large language models (LLMs) in education, specifically focusing on automated short answer grading (ASAG). It emphasizes the significant improvements in grading performance facilitated by LLMs through a human-in-the-loop (HITL) approach, which incorporates human feedback to refine the grading process. The proposed GradeHITL framework enhances grading accuracy by enabling LLMs to consult human experts regarding grading rubrics, promoting iterative improvements in rubric design. While the integration of LLMs in grading presents notable advantages, such as increased efficiency and accuracy, it also poses challenges, including the need for high-quality question generation and the complexities involved in language interpretation. Overall, the document underscores the potential of generative AI to transform educational assessment while acknowledging the hurdles that must be addressed to optimize its effectiveness.

Key Applications

GradeHITL - Human-in-the-Loop Automatic Short Answer Grading

Context: ASAG tasks, specifically evaluating open-ended textual responses in educational assessments, targeted towards educators and students in mathematics education.

Implementation: GradeHITL integrates LLMs with human expert feedback to refine grading rubrics. It involves components for grading, inquiring (LLMs generating questions for human experts), and optimizing rubrics based on feedback.

Outcomes: Significant improvements in grading accuracy and rubric effectiveness compared to fully automated methods. Enhanced understanding of student responses and reduction in grading variability.

Challenges: Quality of questions generated by LLMs can vary; small changes in input can lead to significant output differences, complicating grading consistency.

Implementation Barriers

Technical Barrier

LLMs struggle with domain-specific jargon and the complexity of language expressions, leading to inaccuracies in grading.

Proposed Solutions: Human-in-the-loop integration where expert feedback is used to refine grading rubrics and enhance LLM understanding.

Quality Control Barrier

The quality of questions generated by LLMs during the inquiry process may not always be high, impacting the grading accuracy.

Proposed Solutions: Implement a reinforcement learning-based question selection method to filter out low-quality questions.

Project Team

Hang Li

Researcher

Yucheng Chu

Researcher

Kaiqi Yang

Researcher

Yasemin Copur-Gencturk

Researcher

Jiliang Tang

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Hang Li, Yucheng Chu, Kaiqi Yang, Yasemin Copur-Gencturk, Jiliang Tang

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies