Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications
Project Overview
The document examines the role of generative AI, particularly prompted large language models (LLMs), in enhancing educational practices and assessment strategies. It emphasizes the innovative applications of LLMs, such as generating open-ended questions from textbooks, evaluating human resource interview transcripts, and correcting grammatical errors in underrepresented languages like Bengali. The research assesses the performance of these AI models against human experts, revealing their effectiveness in specific tasks while also highlighting the challenges and limitations encountered during their integration into educational contexts. Overall, the findings suggest that while generative AI has significant potential to transform educational practices, careful consideration of its implementation challenges is essential for maximizing its benefits.
Key Applications
Generating open-ended questions and multiple-choice questions (MCQs) from textbooks
Context: Educational context for both school-level and undergraduate students across various technical and non-technical subjects, including multiple languages like Bengali, Hindi, German, and English.
Implementation: Utilizing prompt-based techniques and multi-stage prompting strategies to generate open-ended questions and MCQs from educational texts (such as NCERT and technical textbooks). This includes curating specialized datasets for effective question generation, leveraging models like T5 LARGE and GPT-based architectures.
Outcomes: Improved quality and diversity of generated questions, with T5 LARGE outperforming other models in automated evaluation metrics, although both still fall short of human baseline performance. Enhanced distractor generation and question quality have been noted in multiple languages.
Challenges: Inadequate existing QA datasets for educational settings, difficulty in matching human expertise, and the need for further refinement and fine-tuning for low-resource languages.
Assessing and providing feedback on interview transcripts and grammatical errors
Context: Language learning and HR interview preparation contexts for candidates, particularly L2 English speakers and students learning languages like Bengali.
Implementation: Creating specialized datasets (such as HURIT) for evaluating LLMs' performance in assessing HR interview transcripts and providing grammatical error explanations. This includes investigating LLM capabilities in providing detailed feedback and error identification.
Outcomes: LLMs demonstrated competence in scoring interview transcripts but struggled with error identification and feedback provision. Identified shortcomings in providing detailed grammatical explanations.
Challenges: Limited real-world datasets for specific contexts, need for human oversight to ensure quality feedback, and lack of comprehensive feedback mechanisms.
Implementation Barriers
Data availability and language resource disparity
Inadequate existing QA datasets for educational contexts, especially for prompt-based question generation. Low-resource languages like Bengali lack sufficient datasets for training effective LLMs.
Proposed Solutions: Curating new datasets tailored for educational purposes, such as EduProbe for school-level subjects, and developing authentic datasets for low-resource languages while integrating manual checks for grammatical error correction.
Model limitations
LLMs often fall short of human expertise in generating high-quality, contextually relevant questions.
Proposed Solutions: Further research and refinement of LLMs to improve their performance over time.
Human oversight requirement
LLMs struggle with error identification and providing actionable feedback in assessments.
Proposed Solutions: Adopting a human-in-the-loop approach to supplement LLM evaluations with human expertise.
Project Team
Subhankar Maity
Researcher
Aniket Deroy
Researcher
Sudeshna Sarkar
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Subhankar Maity, Aniket Deroy, Sudeshna Sarkar
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai