Skip to main content Skip to navigation

Integrating LLMs for Grading and Appeal Resolution in Computer Science Education

Project Overview

This document examines the role of generative AI, specifically Large Language Models (LLMs), in enhancing educational processes within computer science. It introduces AI-PAT, an innovative AI-powered assessment tool designed to automate grading, deliver feedback, and streamline student appeal resolutions. Findings suggest that while AI-PAT significantly improves efficiency and consistency in grading, it struggles with subjective assessments and the variability of model outputs. Student feedback indicates a notable concern regarding trust in the AI grading system and perceptions of fairness, emphasizing the necessity for human oversight and clear, transparent grading criteria. Overall, the integration of AI in education presents both opportunities for improved operational efficiency and challenges that must be addressed to ensure equitable and trustworthy assessment practices.

Key Applications

AI-PAT (AI-powered assessment tool)

Context: Computer science education, specifically in an Object-Oriented Programming course for undergraduate students.

Implementation: AI-PAT was integrated into the evaluation framework, assessing exam submissions and handling appeals using LLMs.

Outcomes: AI-PAT demonstrated scalability in providing detailed feedback and managing the appeal process, leading to grade changes in 74% of appeal cases.

Challenges: Challenges include variability in AI model outputs, issues of trust and fairness from students, and the necessity for human oversight.

Implementation Barriers

Trust and Fairness

Students expressed concerns regarding the fairness and transparency of AI-generated evaluations, which affected their trust in the grading process.

Proposed Solutions: Implement transparent grading rubrics and human oversight mechanisms to enhance trust and perceived fairness.

Model Variability

Different LLM models exhibited significant variability in grading, impacting the consistency of assessments and reliability of results.

Proposed Solutions: Refine prompt design and establish calibration processes for human graders and AI to improve consistency.

Project Team

I. Aytutuldu

Researcher

O. Yol

Researcher

Y. S. Akgul

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: I. Aytutuldu, O. Yol, Y. S. Akgul

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies