Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education
Project Overview
The document explores the use of ChatGPT, a generative AI model, in the realm of education, particularly focusing on its application in medical training. It underscores the model's ability to provide context-oriented answers and demonstrate deductive reasoning, making it a potentially valuable resource for e-learners tackling complex medical and clinical questions. Despite these strengths, the document points out significant challenges, including the need for enhanced accuracy, concerns regarding exam integrity, and biases present in AI-generated content. The findings indicate that while ChatGPT has shown promise in supporting educational outcomes, particularly in medical education, further research is essential to address these limitations and optimize its effectiveness within academic environments.
Key Applications
ChatGPT performance on standardized exams
Context: Evaluating the effectiveness of ChatGPT for students in medical and law education while they prepare for standardized licensing and law school exams.
Implementation: ChatGPT's performance was evaluated through a series of USMLE questions for medical students and real law school exams for law students, assessing its ability to provide accurate answers and reasoning in both contexts.
Outcomes: ChatGPT achieved an accuracy of 58.8% on logical medical questions and 60% on ethical questions, indicating it can approach passing thresholds for medical exams. In the law context, it performed at a level consistent with a C+ student, passing all courses.
Challenges: Limitations in accuracy for complex logical questions and potential ethical biases in medical contexts, alongside concerns regarding exam integrity due to AI-generated answers in legal contexts.
Implementation Barriers
Ethical
Concerns about the integrity of exams due to the use of AI like ChatGPT, which can generate high-quality responses with minimal input.
Proposed Solutions: Potential countermeasures to uphold exam integrity were suggested.
Accuracy
ChatGPT's performance shows variability, particularly in logical questions and certain subjects like Anatomy. Further research is needed to improve accuracy and understand how to better utilize ChatGPT for educational purposes.
Proposed Solutions: Further research is needed to improve accuracy and understand how to better utilize ChatGPT for educational purposes.
Project Team
Prabin Sharma
Researcher
Kisan Thapa
Researcher
Dikshya Thapa
Researcher
Prastab Dhakal
Researcher
Mala Deep Upadhaya
Researcher
Santosh Adhikari
Researcher
Salik Ram Khanal
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Prabin Sharma, Kisan Thapa, Dikshya Thapa, Prastab Dhakal, Mala Deep Upadhaya, Santosh Adhikari, Salik Ram Khanal
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai