Skip to main content Skip to navigation

Automated Scoring for Reading Comprehension via In-context BERT Tuning

Project Overview

The document explores the application of generative AI in education, particularly focusing on enhancing reading comprehension assessment through automated scoring methods. By employing in-context fine-tuning of advanced language models, such as BERT, the proposed approach seeks to minimize the human effort involved in grading by establishing a unified scoring model that can assess multiple reading items based on contextual information derived from shared passages. The findings demonstrate the model's superior effectiveness compared to traditional scoring methods, showcasing its potential to streamline the grading process. However, the study also addresses significant challenges, including the presence of biases in scoring and the model's limitations in accommodating the diverse range of student responses. Overall, the document emphasizes the promise of generative AI technologies in improving educational assessment while also highlighting the need for further refinement to ensure equitable and effective evaluation outcomes.

Key Applications

Automated Scoring System using BERT

Context: Educational context for assessing reading comprehension in grades 4 and 8; target audience includes educators and students.

Implementation: Developed an automated scoring approach that uses in-context fine-tuning of BERT, leveraging contextual information from shared passages to create a single scoring model for multiple items.

Outcomes: Achieved higher scoring accuracy compared to existing models, with improvements in the Quadratic Weighted Kappa (QWK) metric.

Challenges: Existence of biases in scoring across different demographic groups; difficulty in generalization for unseen items.

Implementation Barriers

Technical Limitations

The model struggles with input limitations of BERT, particularly with long passages that exceed the token limit.

Proposed Solutions: Future work should focus on efficiently summarizing or selecting relevant parts of the passage to include in the input.

Bias and Fairness

The model exhibits biases towards different demographic groups, leading to unfair scoring.

Proposed Solutions: Incorporate fairness regularization in the training process to promote equitable scoring across demographic groups.

Project Team

Nigel Fernandez

Researcher

Aritra Ghosh

Researcher

Naiming Liu

Researcher

Zichao Wang

Researcher

Benoît Choffin

Researcher

Richard Baraniuk

Researcher

Andrew Lan

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Nigel Fernandez, Aritra Ghosh, Naiming Liu, Zichao Wang, Benoît Choffin, Richard Baraniuk, Andrew Lan

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies