Leveraging AI Graders for Missing Score Imputation to Achieve Accurate Ability Estimation in Constructed-Response Tests
Project Overview
The document explores the application of generative AI in education, focusing on a novel method for imputing missing scores in educational assessments, particularly for constructed-response tests. By leveraging automated scoring technologies, including advanced large language models (LLMs) and item response theory (IRT), the method aims to enhance the accuracy of ability estimation from incomplete data, thereby significantly reducing the manual grading workload for educators. The findings suggest that this approach improves the robustness of ability estimation, especially in scenarios characterized by high missing data ratios and diverse student responses. Overall, the integration of generative AI in educational assessments represents a meaningful advancement, promising to streamline grading processes and provide more precise evaluations of student capabilities.
Key Applications
Automated scoring technologies for missing score imputation
Context: Assessment of learners' abilities in constructed-response tests such as essays and short-answer questions, targeting educators and assessment developers.
Implementation: The study develops neural automated scoring models either by fine-tuning existing models or using zero-shot scoring with LLMs to predict missing scores for IRT-based ability estimation.
Outcomes: Achieves high accuracy in ability estimation, reduces manual grading workload, and provides robust imputation even with high missing ratios.
Challenges: Dependence on the accuracy of automated scoring models, computational resource requirements, lack of generalizability across languages, and issues with interpretability of imputed scores.
Implementation Barriers
Technical Barrier
The effectiveness of the method depends on the accuracy of the automated scoring model used for imputation. If the model is inaccurate, it could lead to biased estimations. The use of complex models like LLMs limits the interpretability of individual imputed scores, complicating fairness evaluations.
Proposed Solutions: Future research could focus on improving the accuracy of automated scoring models through better training techniques and validation against diverse datasets. Additionally, developing methods to enhance the transparency and interpretability of imputed scores could foster trust in high-stakes testing.
Resource Barrier
Developing scoring models requires substantial computational resources for fine-tuning or careful prompt engineering for zero-shot approaches.
Proposed Solutions: Exploring more efficient model training methods or leveraging cloud computing resources to reduce local computational burdens.
Generalizability Barrier
The findings are based on specific datasets limited to the Japanese language, which may restrict the generalizability of the method to other contexts.
Proposed Solutions: Testing the proposed method on a wider array of datasets across different languages and educational contexts to enhance external validity.
Project Team
Masaki Uto
Researcher
Yuma Ito
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Masaki Uto, Yuma Ito
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai