Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben
Project Overview
The document critically examines the deployment of generative AI in education, specifically focusing on the Fobizz AI Grading Assistant, an AI-powered tool designed to assist teachers in grading and providing feedback on student assignments. It reveals significant shortcomings in the tool's functionality, such as inconsistent grading, failure to detect nonsensical submissions, and unreliable adherence to grading criteria. These issues raise concerns about the increasing trend of relying on AI as a quick solution to deeper systemic educational challenges. The study emphasizes the necessity of human judgment in educational assessments, arguing that automation in grading can compromise the credibility of evaluations and worsen existing problems related to educational quality and fairness. It ultimately calls for a more systematic evaluation and pedagogical scrutiny of AI tools in education to ensure they enhance rather than hinder the learning and assessment process.
Key Applications
AI Grading and Feedback Tool
Context: Used in various educational contexts, including classrooms where teachers evaluate student assignments and provide feedback on submissions.
Implementation: The AI grading and feedback tool was tested through multiple evaluations of student assignments and submissions, aimed at assisting teachers in grading and offering feedback.
Outcomes: The tool demonstrated significant variability in grading accuracy and quality of feedback. Many results were inconsistent and unreliable, often yielding random outcomes.
Challenges: Key challenges included inconsistent feedback, unreliable recognition of factual inaccuracies, lack of transparency in grading criteria, and difficulties in detecting AI-generated texts.
Implementation Barriers
Technical
The inherent limitations of large language models (LLMs) lead to random and inconsistent grading results, as well as the tool's failure to reliably detect factual inaccuracies and nonsense submissions.
Proposed Solutions: There are calls for improved evaluation processes, better programming techniques, and more thorough testing before deployment.
Pedagogical
The tool's inability to provide reliable feedback compromises teaching effectiveness.
Proposed Solutions: A critical review of the pedagogical suitability of such tools is recommended before their implementation.
Operational Barrier
Inconsistent and random feedback leading to unreliable evaluations.
Proposed Solutions: Employing multiple iterations of grading for the same submission to gauge variability and ensuring human oversight in the evaluation process.
Transparency Barrier
Lack of clarity regarding the tool's limitations and capabilities.
Proposed Solutions: Improving user interface communication regarding the capabilities of the tool and the nature of its outputs.
Project Team
Rainer Muehlhoff
Researcher
Marte Henningsen
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Rainer Muehlhoff, Marte Henningsen
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai