Skip to main content Skip to navigation

Automatic assessment of text-based responses in post-secondary education: A systematic review

Project Overview

The document explores the transformative role of generative AI, especially through Natural Language Processing (NLP) and Large Language Models (LLMs), in improving education, with a particular focus on automating the assessment of text-based responses in post-secondary settings. It addresses the complexities associated with grading open-ended questions and outlines the advantages of automated assessment systems, which enhance accuracy and efficiency in evaluating student performance across various subjects, including science and language learning. The systematic review identifies different types of automated assessment systems, detailing their applications, positive outcomes, and the inherent challenges, such as technical limitations and the necessity for thoughtful implementation. Additionally, it underscores the importance of ongoing research and development to further advance the educational applications of AI, aiming to refine these tools and address existing obstacles. Overall, the findings suggest that while generative AI holds significant promise for educational assessment, careful consideration is required for effective integration into teaching and learning environments.

Key Applications

Automated Assessment Systems

Context: Post-secondary education, including higher education and statewide implementations, focusing on automating assessment for large classes, structured questions, and student writing responses.

Implementation: Involves utilizing various AI technologies and methodologies, including machine learning approaches and natural language processing, to automate the assessment of student responses in diverse formats, such as essays, short answers, and critical thinking evaluations.

Outcomes: Improved grading efficiency, timely feedback for students, enhanced learning outcomes through personalized learning, and insights into student performance across different assessment types.

Challenges: Technical limitations in accurately assessing open-ended responses, potential bias in AI grading, ensuring fairness and reliability of automated scores, and the need for continuous updates to maintain system reliability.

Automated Scoring for Argumentation and Critical Thinking

Context: Higher education settings that emphasize formative assessments and critical thinking skills across various disciplines.

Implementation: Utilizes automated scoring systems to evaluate scientific argumentation and critical thinking through established methodologies, providing feedback based on student performance in discussions and written responses.

Outcomes: Enhanced feedback mechanisms, improved insight into student critical thinking capabilities, and increased efficiency in grading complex student responses.

Challenges: Accuracy of scoring, interpretability of results, and difficulties in defining and measuring critical thinking.

Implementation Barriers

Technical barrier

The challenge of accurately assessing complex, open-ended responses compared to structured formats like multiple-choice questions, along with the complexity in implementing and maintaining AI systems.

Proposed Solutions: Improving AI models, integrating human oversight in the grading process, developing hybrid systems that combine AI and human grading, investing in user-friendly AI tools, and providing ongoing technical support.

Pedagogical barrier

Instructors may be uncertain about when to effectively implement automated grading systems versus traditional grading.

Proposed Solutions: Providing training for educators on the effective use of AI tools and integrating AI systems into existing pedagogical frameworks.

Validity Barrier

Concerns over the accuracy and fairness of automated assessments.

Proposed Solutions: Regular calibration and validation of AI models against human scoring.

Trust Barrier

Skepticism from educators regarding the reliability of AI tools.

Proposed Solutions: Providing evidence of successful case studies and pilot programs.

Project Team

Rujun Gao

Researcher

Hillary E. Merzdorf

Researcher

Saira Anwar

Researcher

M. Cynthia Hipwell

Researcher

Arun Srinivasa

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Rujun Gao, Hillary E. Merzdorf, Saira Anwar, M. Cynthia Hipwell, Arun Srinivasa

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies