BD at BEA 2025 Shared Task: MPNet Ensembles for Pedagogical Mistake Identification and Localization in AI Tutor Responses
Project Overview
The document explores the integration of generative AI in education, particularly the efficacy of AI-powered tutors in identifying and addressing student errors. Through the application of the MPNet model, the authors categorize tutor responses into three distinct classifications: correct identification of mistakes, partial acknowledgment, and failure to recognize errors. To enhance performance, they employ an ensemble learning strategy, yielding impressive results on benchmark tasks. The findings underscore the critical role of accurate classification in educational dialogues, while also acknowledging the complexities introduced by nuanced language in tutor responses. Overall, the study highlights the potential of generative AI to improve personalized learning by effectively supporting students through targeted feedback and error recognition.
Key Applications
MPNet Ensembles for Pedagogical Mistake Identification and Localization
Context: Educational dialogue systems for tutoring, targeting AI tutors and educational technology developers.
Implementation: The system fine-tunes MPNet using class-weighted cross-entropy loss and grouped cross-validation to handle class imbalance, combining multiple models through hard-voting.
Outcomes: Achieved macro-F1 scores of approximately 0.7110 for mistake identification and 0.5543 for mistake localization, indicating strong performance in classifying tutor responses.
Challenges: Difficulties in distinguishing between full and partial error acknowledgment, and the subjective nature of pedagogical feedback classification.
Implementation Barriers
Technical Limitation
Calibration of model predictions is not well-adjusted, leading to potential overconfidence in incorrect labels.
Proposed Solutions: Future work could implement calibration techniques like temperature scaling to improve the reliability of confidence scores.
Evaluation Subjectivity
The distinction between 'Yes' and 'To some extent' predictions is subjective, causing ambiguity in labeling.
Proposed Solutions: Modeling the task as ordinal or probabilistic may better capture this continuum of responses.
Model Efficiency
Current models may struggle with nuanced tutor-student interactions due to limitations in handling long or complex responses.
Proposed Solutions: Exploring larger, specialized models or multitask learning frameworks to enhance generalization.
Project Team
Shadman Rohan
Researcher
Ishita Sur Apan
Researcher
Muhtasim Ibteda Shochcho
Researcher
Md Fahim
Researcher
Mohammad Ashfaq Ur Rahman
Researcher
AKM Mahbubur Rahman
Researcher
Amin Ahsan Ali
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Shadman Rohan, Ishita Sur Apan, Muhtasim Ibteda Shochcho, Md Fahim, Mohammad Ashfaq Ur Rahman, AKM Mahbubur Rahman, Amin Ahsan Ali
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai