Automated Scoring for Reading Comprehension via In-context BERT Tuning
Project Overview
The document explores the application of generative AI in education, particularly focusing on enhancing reading comprehension assessment through automated scoring methods. By employing in-context fine-tuning of advanced language models, such as BERT, the proposed approach seeks to minimize the human effort involved in grading by establishing a unified scoring model that can assess multiple reading items based on contextual information derived from shared passages. The findings demonstrate the model's superior effectiveness compared to traditional scoring methods, showcasing its potential to streamline the grading process. However, the study also addresses significant challenges, including the presence of biases in scoring and the model's limitations in accommodating the diverse range of student responses. Overall, the document emphasizes the promise of generative AI technologies in improving educational assessment while also highlighting the need for further refinement to ensure equitable and effective evaluation outcomes.
Key Applications
Automated Scoring System using BERT
Context: Educational context for assessing reading comprehension in grades 4 and 8; target audience includes educators and students.
Implementation: Developed an automated scoring approach that uses in-context fine-tuning of BERT, leveraging contextual information from shared passages to create a single scoring model for multiple items.
Outcomes: Achieved higher scoring accuracy compared to existing models, with improvements in the Quadratic Weighted Kappa (QWK) metric.
Challenges: Existence of biases in scoring across different demographic groups; difficulty in generalization for unseen items.
Implementation Barriers
Technical Limitations
The model struggles with input limitations of BERT, particularly with long passages that exceed the token limit.
Proposed Solutions: Future work should focus on efficiently summarizing or selecting relevant parts of the passage to include in the input.
Bias and Fairness
The model exhibits biases towards different demographic groups, leading to unfair scoring.
Proposed Solutions: Incorporate fairness regularization in the training process to promote equitable scoring across demographic groups.
Project Team
Nigel Fernandez
Researcher
Aritra Ghosh
Researcher
Naiming Liu
Researcher
Zichao Wang
Researcher
Benoît Choffin
Researcher
Richard Baraniuk
Researcher
Andrew Lan
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Nigel Fernandez, Aritra Ghosh, Naiming Liu, Zichao Wang, Benoît Choffin, Richard Baraniuk, Andrew Lan
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai