Skip to main content Skip to navigation

MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback

Project Overview

The document examines the innovative application of generative AI in education, specifically focusing on the development of the MCQG-SRefine framework aimed at generating high-quality multiple-choice questions (MCQs) for the United States Medical Licensing Examination (USMLE) using advanced large language models like GPT-4. It highlights the framework's ability to address common challenges faced by LLMs, such as outdated information and inaccuracies, by employing an iterative process of self-critique and correction that significantly enhances the quality and difficulty of the generated questions. The implementation of this framework in medical education demonstrates its effectiveness in creating relevant clinical scenarios, structured questions, and appropriate distractor options grounded in real medical cases. The structured approach, which includes context creation, question formulation, and refinement based on evaluative feedback, not only improves the accuracy of the content but also aligns with the preferences of human evaluators, who favor the enhanced outputs over traditional methods. Overall, the findings underscore the potential of generative AI to transform assessment practices in medical education by producing more relevant and high-quality evaluation materials.

Key Applications

MCQ Generation Framework

Context: Used in the educational context for medical students preparing for licensing examinations, specifically focusing on generating high-quality multiple-choice questions from clinical notes to aid in their studies.

Implementation: Employs a self-refinement process utilizing large language models (LLMs) that incorporate expert critique and guidelines. The framework generates comprehensive multiple-choice questions, including context, questions, answers, and distractor options, ensuring relevance and challenge.

Outcomes: Results in improved question quality and difficulty, with a human expert preference for generated questions over traditional methods. It enhances the assessment and practice for students by providing tailored questions that better reflect the licensing examinations.

Challenges: Faces issues such as outdated knowledge in LLMs, hallucination problems, and the necessity for domain expertise in the formulation of questions to maintain relevance and prevent guesswork.

Implementation Barriers

Technical

LLMs like GPT-4 may produce outdated or inaccurate information (hallucinations) in generated questions and face challenges in generating contextually relevant questions that align with medical curriculum standards.

Proposed Solutions: Integrate expert-driven prompt engineering and iterative self-refinement to enhance question quality and relevance. Implement iterative feedback loops to refine generated questions and ensure coherence with clinical guidelines.

Educational

Students may struggle to formulate precise prompts necessary for generating specific questions.

Proposed Solutions: Provide structured guidelines and examples for students on how to effectively prompt LLMs.

Quality Assurance

Difficulty in ensuring the generated questions meet the required standards of quality and difficulty, and the risk of generating distractor options that may confuse rather than assess a student's knowledge accurately.

Proposed Solutions: Implement an iterative critique and correction process involving expert evaluation metrics, and use expert reviews and validation processes to ensure the distractors are plausible and relevant.

Project Team

Zonghai Yao

Researcher

Aditya Parashar

Researcher

Huixue Zhou

Researcher

Won Seok Jang

Researcher

Feiyun Ouyang

Researcher

Zhichao Yang

Researcher

Hong Yu

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Zonghai Yao, Aditya Parashar, Huixue Zhou, Won Seok Jang, Feiyun Ouyang, Zhichao Yang, Hong Yu

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies