Integrating LLMs for Grading and Appeal Resolution in Computer Science Education
Project Overview
This document examines the role of generative AI, specifically Large Language Models (LLMs), in enhancing educational processes within computer science. It introduces AI-PAT, an innovative AI-powered assessment tool designed to automate grading, deliver feedback, and streamline student appeal resolutions. Findings suggest that while AI-PAT significantly improves efficiency and consistency in grading, it struggles with subjective assessments and the variability of model outputs. Student feedback indicates a notable concern regarding trust in the AI grading system and perceptions of fairness, emphasizing the necessity for human oversight and clear, transparent grading criteria. Overall, the integration of AI in education presents both opportunities for improved operational efficiency and challenges that must be addressed to ensure equitable and trustworthy assessment practices.
Key Applications
AI-PAT (AI-powered assessment tool)
Context: Computer science education, specifically in an Object-Oriented Programming course for undergraduate students.
Implementation: AI-PAT was integrated into the evaluation framework, assessing exam submissions and handling appeals using LLMs.
Outcomes: AI-PAT demonstrated scalability in providing detailed feedback and managing the appeal process, leading to grade changes in 74% of appeal cases.
Challenges: Challenges include variability in AI model outputs, issues of trust and fairness from students, and the necessity for human oversight.
Implementation Barriers
Trust and Fairness
Students expressed concerns regarding the fairness and transparency of AI-generated evaluations, which affected their trust in the grading process.
Proposed Solutions: Implement transparent grading rubrics and human oversight mechanisms to enhance trust and perceived fairness.
Model Variability
Different LLM models exhibited significant variability in grading, impacting the consistency of assessments and reliability of results.
Proposed Solutions: Refine prompt design and establish calibration processes for human graders and AI to improve consistency.
Project Team
I. Aytutuldu
Researcher
O. Yol
Researcher
Y. S. Akgul
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: I. Aytutuldu, O. Yol, Y. S. Akgul
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai