Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs
Project Overview
The document explores the integration of Large Language Models (LLMs) in education, particularly their role in Automated Essay Scoring (AES) systems, which provide timely and personalized feedback for second-language learners. It presents a dual-process framework that merges quick score predictions with in-depth explanations, significantly enhancing grading accuracy and efficiency. The findings reveal that LLMs not only empower novice evaluators to perform at expert levels but also improve the effectiveness of seasoned graders, underscoring the potential for productive human-AI collaboration in educational settings. This innovative approach highlights the transformative impact of generative AI in facilitating learning and assessment, ultimately aiming to foster better educational outcomes.
Key Applications
Automated Essay Scoring (AES) system based on LLMs
Context: Second-language learners, particularly in high school settings in China where there is a high student-teacher ratio.
Implementation: Developed an AES system inspired by dual-process theory; extensive experiments conducted with both public and private datasets to assess effectiveness.
Outcomes: Enhanced grading accuracy and efficiency; novice graders achieved performance levels comparable to expert graders with LLM assistance.
Challenges: Diverse exercise contexts and ambiguity in scoring rubrics complicate accurate scoring; reliance on LLMs without human supervision remains impractical.
Implementation Barriers
Operational
Diverse exercise contexts and ambiguity in scoring rubrics complicate the ability of traditional models to deliver accurate scores.
Proposed Solutions: Implement AES systems that integrate model confidence scores and natural language explanations to facilitate human supervision.
Technical
LLMs do not always surpass conventional grading models in performance and may struggle with complex scoring criteria.
Proposed Solutions: Enhance LLM capabilities through supervised fine-tuning and develop frameworks that leverage both quick scoring and detailed feedback.
Project Team
Changrong Xiao
Researcher
Wenxing Ma
Researcher
Qingping Song
Researcher
Sean Xin Xu
Researcher
Kunpeng Zhang
Researcher
Yufang Wang
Researcher
Qi Fu
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Changrong Xiao, Wenxing Ma, Qingping Song, Sean Xin Xu, Kunpeng Zhang, Yufang Wang, Qi Fu
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai