Skip to main content Skip to navigation

Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben

Project Overview

The document critically examines the deployment of generative AI in education, specifically focusing on the Fobizz AI Grading Assistant, an AI-powered tool designed to assist teachers in grading and providing feedback on student assignments. It reveals significant shortcomings in the tool's functionality, such as inconsistent grading, failure to detect nonsensical submissions, and unreliable adherence to grading criteria. These issues raise concerns about the increasing trend of relying on AI as a quick solution to deeper systemic educational challenges. The study emphasizes the necessity of human judgment in educational assessments, arguing that automation in grading can compromise the credibility of evaluations and worsen existing problems related to educational quality and fairness. It ultimately calls for a more systematic evaluation and pedagogical scrutiny of AI tools in education to ensure they enhance rather than hinder the learning and assessment process.

Key Applications

AI Grading and Feedback Tool

Context: Used in various educational contexts, including classrooms where teachers evaluate student assignments and provide feedback on submissions.

Implementation: The AI grading and feedback tool was tested through multiple evaluations of student assignments and submissions, aimed at assisting teachers in grading and offering feedback.

Outcomes: The tool demonstrated significant variability in grading accuracy and quality of feedback. Many results were inconsistent and unreliable, often yielding random outcomes.

Challenges: Key challenges included inconsistent feedback, unreliable recognition of factual inaccuracies, lack of transparency in grading criteria, and difficulties in detecting AI-generated texts.

Implementation Barriers

Technical

The inherent limitations of large language models (LLMs) lead to random and inconsistent grading results, as well as the tool's failure to reliably detect factual inaccuracies and nonsense submissions.

Proposed Solutions: There are calls for improved evaluation processes, better programming techniques, and more thorough testing before deployment.

Pedagogical

The tool's inability to provide reliable feedback compromises teaching effectiveness.

Proposed Solutions: A critical review of the pedagogical suitability of such tools is recommended before their implementation.

Operational Barrier

Inconsistent and random feedback leading to unreliable evaluations.

Proposed Solutions: Employing multiple iterations of grading for the same submission to gauge variability and ensuring human oversight in the evaluation process.

Transparency Barrier

Lack of clarity regarding the tool's limitations and capabilities.

Proposed Solutions: Improving user interface communication regarding the capabilities of the tool and the nature of its outputs.

Project Team

Rainer Muehlhoff

Researcher

Marte Henningsen

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Rainer Muehlhoff, Marte Henningsen

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies