Improve LLM-based Automatic Essay Scoring with Linguistic Features
Project Overview
The document explores the application of generative AI, specifically Large Language Models (LLMs), in education, focusing on Automatic Essay Scoring (AES). It emphasizes the need for integrating linguistic features to enhance the accuracy of scoring systems, addressing challenges in adapting these systems to various essay prompts. The research indicates that hybrid approaches, which combine LLMs with traditional feature-based methods, can significantly improve the alignment of AI evaluations with human scoring, particularly in scenarios with diverse prompts. By leveraging these advancements, the document illustrates the potential of generative AI to transform educational assessment, leading to more reliable and nuanced evaluations of student writing. Ultimately, the findings underscore the importance of refining AI tools to better meet the complexities of educational assessments while maintaining fidelity to human judgment.
Key Applications
Automatic Essay Scoring (AES) using LLMs
Context: The educational context involves assessing writing skills of students in grades 7 to 10, targeting educators and institutions.
Implementation: The system employs a hybrid approach, integrating linguistic features into zero-shot prompts used with LLMs for grading essays.
Outcomes: The incorporation of linguistic features into the prompts improved scoring accuracy, aligning evaluations more closely with human judgments.
Challenges: Challenges include the complexity of essay grading across varied prompts and the inherent limitations in LLM performance without additional features.
Implementation Barriers
Technical barrier
LLMs may underperform in complex evaluation tasks like essay scoring, especially in diverse grading scenarios.
Proposed Solutions: Combining LLMs with linguistic features and supervised methods to improve performance.
Data barrier
The datasets used for training and evaluation may have biases, particularly focusing on specific demographic groups.
Proposed Solutions: Developing more diverse and inclusive datasets for training and evaluation.
Project Team
Zhaoyi Joey Hou
Researcher
Alejandro Ciuba
Researcher
Xiang Lorraine Li
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Zhaoyi Joey Hou, Alejandro Ciuba, Xiang Lorraine Li
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai