Skip to main content Skip to navigation

BD at BEA 2025 Shared Task: MPNet Ensembles for Pedagogical Mistake Identification and Localization in AI Tutor Responses

Project Overview

The document explores the integration of generative AI in education, particularly the efficacy of AI-powered tutors in identifying and addressing student errors. Through the application of the MPNet model, the authors categorize tutor responses into three distinct classifications: correct identification of mistakes, partial acknowledgment, and failure to recognize errors. To enhance performance, they employ an ensemble learning strategy, yielding impressive results on benchmark tasks. The findings underscore the critical role of accurate classification in educational dialogues, while also acknowledging the complexities introduced by nuanced language in tutor responses. Overall, the study highlights the potential of generative AI to improve personalized learning by effectively supporting students through targeted feedback and error recognition.

Key Applications

MPNet Ensembles for Pedagogical Mistake Identification and Localization

Context: Educational dialogue systems for tutoring, targeting AI tutors and educational technology developers.

Implementation: The system fine-tunes MPNet using class-weighted cross-entropy loss and grouped cross-validation to handle class imbalance, combining multiple models through hard-voting.

Outcomes: Achieved macro-F1 scores of approximately 0.7110 for mistake identification and 0.5543 for mistake localization, indicating strong performance in classifying tutor responses.

Challenges: Difficulties in distinguishing between full and partial error acknowledgment, and the subjective nature of pedagogical feedback classification.

Implementation Barriers

Technical Limitation

Calibration of model predictions is not well-adjusted, leading to potential overconfidence in incorrect labels.

Proposed Solutions: Future work could implement calibration techniques like temperature scaling to improve the reliability of confidence scores.

Evaluation Subjectivity

The distinction between 'Yes' and 'To some extent' predictions is subjective, causing ambiguity in labeling.

Proposed Solutions: Modeling the task as ordinal or probabilistic may better capture this continuum of responses.

Model Efficiency

Current models may struggle with nuanced tutor-student interactions due to limitations in handling long or complex responses.

Proposed Solutions: Exploring larger, specialized models or multitask learning frameworks to enhance generalization.

Project Team

Shadman Rohan

Researcher

Ishita Sur Apan

Researcher

Muhtasim Ibteda Shochcho

Researcher

Md Fahim

Researcher

Mohammad Ashfaq Ur Rahman

Researcher

AKM Mahbubur Rahman

Researcher

Amin Ahsan Ali

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Shadman Rohan, Ishita Sur Apan, Muhtasim Ibteda Shochcho, Md Fahim, Mohammad Ashfaq Ur Rahman, AKM Mahbubur Rahman, Amin Ahsan Ali

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies