MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
Project Overview
The document examines the innovative application of generative AI in education, specifically focusing on the development of the MCQG-SRefine framework aimed at generating high-quality multiple-choice questions (MCQs) for the United States Medical Licensing Examination (USMLE) using advanced large language models like GPT-4. It highlights the framework's ability to address common challenges faced by LLMs, such as outdated information and inaccuracies, by employing an iterative process of self-critique and correction that significantly enhances the quality and difficulty of the generated questions. The implementation of this framework in medical education demonstrates its effectiveness in creating relevant clinical scenarios, structured questions, and appropriate distractor options grounded in real medical cases. The structured approach, which includes context creation, question formulation, and refinement based on evaluative feedback, not only improves the accuracy of the content but also aligns with the preferences of human evaluators, who favor the enhanced outputs over traditional methods. Overall, the findings underscore the potential of generative AI to transform assessment practices in medical education by producing more relevant and high-quality evaluation materials.
Key Applications
MCQ Generation Framework
Context: Used in the educational context for medical students preparing for licensing examinations, specifically focusing on generating high-quality multiple-choice questions from clinical notes to aid in their studies.
Implementation: Employs a self-refinement process utilizing large language models (LLMs) that incorporate expert critique and guidelines. The framework generates comprehensive multiple-choice questions, including context, questions, answers, and distractor options, ensuring relevance and challenge.
Outcomes: Results in improved question quality and difficulty, with a human expert preference for generated questions over traditional methods. It enhances the assessment and practice for students by providing tailored questions that better reflect the licensing examinations.
Challenges: Faces issues such as outdated knowledge in LLMs, hallucination problems, and the necessity for domain expertise in the formulation of questions to maintain relevance and prevent guesswork.
Implementation Barriers
Technical
LLMs like GPT-4 may produce outdated or inaccurate information (hallucinations) in generated questions and face challenges in generating contextually relevant questions that align with medical curriculum standards.
Proposed Solutions: Integrate expert-driven prompt engineering and iterative self-refinement to enhance question quality and relevance. Implement iterative feedback loops to refine generated questions and ensure coherence with clinical guidelines.
Educational
Students may struggle to formulate precise prompts necessary for generating specific questions.
Proposed Solutions: Provide structured guidelines and examples for students on how to effectively prompt LLMs.
Quality Assurance
Difficulty in ensuring the generated questions meet the required standards of quality and difficulty, and the risk of generating distractor options that may confuse rather than assess a student's knowledge accurately.
Proposed Solutions: Implement an iterative critique and correction process involving expert evaluation metrics, and use expert reviews and validation processes to ensure the distractors are plausible and relevant.
Project Team
Zonghai Yao
Researcher
Aditya Parashar
Researcher
Huixue Zhou
Researcher
Won Seok Jang
Researcher
Feiyun Ouyang
Researcher
Zhichao Yang
Researcher
Hong Yu
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Zonghai Yao, Aditya Parashar, Huixue Zhou, Won Seok Jang, Feiyun Ouyang, Zhichao Yang, Hong Yu
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai