Skip to main content Skip to navigation

Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs

Project Overview

The document explores the integration of Large Language Models (LLMs) in education, particularly their role in Automated Essay Scoring (AES) systems, which provide timely and personalized feedback for second-language learners. It presents a dual-process framework that merges quick score predictions with in-depth explanations, significantly enhancing grading accuracy and efficiency. The findings reveal that LLMs not only empower novice evaluators to perform at expert levels but also improve the effectiveness of seasoned graders, underscoring the potential for productive human-AI collaboration in educational settings. This innovative approach highlights the transformative impact of generative AI in facilitating learning and assessment, ultimately aiming to foster better educational outcomes.

Key Applications

Automated Essay Scoring (AES) system based on LLMs

Context: Second-language learners, particularly in high school settings in China where there is a high student-teacher ratio.

Implementation: Developed an AES system inspired by dual-process theory; extensive experiments conducted with both public and private datasets to assess effectiveness.

Outcomes: Enhanced grading accuracy and efficiency; novice graders achieved performance levels comparable to expert graders with LLM assistance.

Challenges: Diverse exercise contexts and ambiguity in scoring rubrics complicate accurate scoring; reliance on LLMs without human supervision remains impractical.

Implementation Barriers

Operational

Diverse exercise contexts and ambiguity in scoring rubrics complicate the ability of traditional models to deliver accurate scores.

Proposed Solutions: Implement AES systems that integrate model confidence scores and natural language explanations to facilitate human supervision.

Technical

LLMs do not always surpass conventional grading models in performance and may struggle with complex scoring criteria.

Proposed Solutions: Enhance LLM capabilities through supervised fine-tuning and develop frameworks that leverage both quick scoring and detailed feedback.

Project Team

Changrong Xiao

Researcher

Wenxing Ma

Researcher

Qingping Song

Researcher

Sean Xin Xu

Researcher

Kunpeng Zhang

Researcher

Yufang Wang

Researcher

Qi Fu

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Changrong Xiao, Wenxing Ma, Qingping Song, Sean Xin Xu, Kunpeng Zhang, Yufang Wang, Qi Fu

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies