A Study on Large Language Models' Limitations in Multiple-Choice Question Answering
Project Overview
The document examines the role of generative AI, specifically Large Language Models (LLMs), in education, focusing on their application in answering Multiple Choice Questions (MCQs). It reveals that these models face considerable limitations, including difficulties in comprehending the task requirements, a tendency to be influenced by the order of answer choices, and a frequent production of unreliable results. These findings underscore the importance of exercising caution when employing LLMs for educational assessments, as their current capabilities may not meet the accuracy and reliability standards necessary for effective evaluation. The study advocates for ongoing research and improvements in the instruction-following abilities of LLMs to enhance their utility in educational contexts, signaling a need for more robust frameworks before their widespread implementation in assessment scenarios. Overall, while generative AI holds potential for transforming educational practices, careful consideration of its limitations is essential to ensure effective and fair evaluation processes.
Key Applications
Multiple Choice Question Answering using LLMs
Context: Assessment and evaluation of LLM performance in educational settings, particularly in standardized tests and benchmarks.
Implementation: Evaluation of 26 small open-source models using MCQs across various topics and categories.
Outcomes: Identified that 65% of models do not understand the task effectively; only a few models show independence from choice order.
Challenges: High percentage of models exhibit choice-order dependence, poor task understanding, and reliability issues in responses.
Implementation Barriers
Technical Limitations
Many LLMs do not fully understand the MCQ task and are overly dependent on the order of answer choices.
Proposed Solutions: Testing models' capacities for understanding tasks and adjusting prompts or training methods to improve instruction-following abilities.
Project Team
Aisha Khatun
Researcher
Daniel G. Brown
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Aisha Khatun, Daniel G. Brown
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai