A Study on Large Language Models' Limitations in Multiple-Choice Question Answering

Project Overview

The document examines the role of generative AI, specifically Large Language Models (LLMs), in education, focusing on their application in answering Multiple Choice Questions (MCQs). It reveals that these models face considerable limitations, including difficulties in comprehending the task requirements, a tendency to be influenced by the order of answer choices, and a frequent production of unreliable results. These findings underscore the importance of exercising caution when employing LLMs for educational assessments, as their current capabilities may not meet the accuracy and reliability standards necessary for effective evaluation. The study advocates for ongoing research and improvements in the instruction-following abilities of LLMs to enhance their utility in educational contexts, signaling a need for more robust frameworks before their widespread implementation in assessment scenarios. Overall, while generative AI holds potential for transforming educational practices, careful consideration of its limitations is essential to ensure effective and fair evaluation processes.

Key Applications

Multiple Choice Question Answering using LLMs

Context: Assessment and evaluation of LLM performance in educational settings, particularly in standardized tests and benchmarks.

Implementation: Evaluation of 26 small open-source models using MCQs across various topics and categories.

Outcomes: Identified that 65% of models do not understand the task effectively; only a few models show independence from choice order.

Challenges: High percentage of models exhibit choice-order dependence, poor task understanding, and reliability issues in responses.

Implementation Barriers

Technical Limitations

Many LLMs do not fully understand the MCQ task and are overly dependent on the order of answer choices.

Proposed Solutions: Testing models' capacities for understanding tasks and adjusting prompts or training methods to improve instruction-following abilities.

Project Team

Aisha Khatun

Researcher

Daniel G. Brown

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Aisha Khatun, Daniel G. Brown

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects