SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark
Project Overview
The document discusses SceMQA, an innovative benchmark designed for scientific multimodal question answering, aimed at enhancing AI applications in high school to pre-college education across core science subjects such as Mathematics, Physics, Chemistry, and Biology. By incorporating both multiple-choice and free-response formats, along with comprehensive explanations for answers, SceMQA seeks to improve the reasoning and comprehension capabilities of AI models, addressing the shortcomings of existing benchmarks that fail to cater to this educational level. Experimental findings reveal that even state-of-the-art AI models face challenges in achieving high accuracy, underscoring the necessity for ongoing advancements in multimodal reasoning and learning approaches. This benchmark not only facilitates better AI performance in educational contexts but also highlights the critical gap in current AI capabilities, suggesting a pathway for future research and development aimed at optimizing generative AI's potential in educational settings.
Key Applications
SceMQA - Scientific College Entrance Level Multimodal Question Answering Benchmark
Context: High school to pre-college education, focusing on various scientific disciplines including mathematics, physics, chemistry, and biology.
Implementation: Developed a benchmark that includes a mix of multiple-choice and free-response questions, requiring various levels of reasoning and understanding. The implementation incorporates advanced reasoning, visual understanding, and diverse question formats, emphasizing conceptual understanding across all scientific subjects.
Outcomes: Offers detailed explanations for a majority of problems, facilitating better understanding and assessment of AI model capabilities in science education. Aims to improve reasoning capabilities in STEM by providing context-rich problems.
Challenges: Existing AI models demonstrate limitations in reasoning and understanding complex problems with accuracy levels ranging from 50% to 60%. Models show significant performance variability, particularly in tasks requiring precise calculations and image understanding, alongside limitations in knowledge affecting biological concepts.
Implementation Barriers
Technical Barrier
Current AI models show only 50% to 60% accuracy in solving benchmark problems, indicating a significant gap in reasoning capabilities. AI models also struggle with tasks requiring precise image interpretation and complex reasoning.
Proposed Solutions: Further research and development are necessary to enhance AI model performance and understanding. Incorporate advanced visual understanding tools and enhance datasets with diverse knowledge components.
Content Barrier
Existing benchmarks often overlook the high school to pre-college educational phase, resulting in a lack of appropriate assessment tools for this stage.
Proposed Solutions: Introduce targeted benchmarks like SceMQA to fill the existing gaps in educational assessment.
Implementation Barrier
AI models struggle with tasks requiring precise image interpretation and complex reasoning.
Proposed Solutions: Incorporate advanced visual understanding tools and enhance datasets with diverse knowledge components.
Project Team
Zhenwen Liang
Researcher
Kehan Guo
Researcher
Gang Liu
Researcher
Taicheng Guo
Researcher
Yujun Zhou
Researcher
Tianyu Yang
Researcher
Jiajun Jiao
Researcher
Renjie Pi
Researcher
Jipeng Zhang
Researcher
Xiangliang Zhang
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Zhenwen Liang, Kehan Guo, Gang Liu, Taicheng Guo, Yujun Zhou, Tianyu Yang, Jiajun Jiao, Renjie Pi, Jipeng Zhang, Xiangliang Zhang
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai