SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark

Project Overview

The document discusses SceMQA, an innovative benchmark designed for scientific multimodal question answering, aimed at enhancing AI applications in high school to pre-college education across core science subjects such as Mathematics, Physics, Chemistry, and Biology. By incorporating both multiple-choice and free-response formats, along with comprehensive explanations for answers, SceMQA seeks to improve the reasoning and comprehension capabilities of AI models, addressing the shortcomings of existing benchmarks that fail to cater to this educational level. Experimental findings reveal that even state-of-the-art AI models face challenges in achieving high accuracy, underscoring the necessity for ongoing advancements in multimodal reasoning and learning approaches. This benchmark not only facilitates better AI performance in educational contexts but also highlights the critical gap in current AI capabilities, suggesting a pathway for future research and development aimed at optimizing generative AI's potential in educational settings.

Key Applications

SceMQA - Scientific College Entrance Level Multimodal Question Answering Benchmark

Context: High school to pre-college education, focusing on various scientific disciplines including mathematics, physics, chemistry, and biology.

Implementation: Developed a benchmark that includes a mix of multiple-choice and free-response questions, requiring various levels of reasoning and understanding. The implementation incorporates advanced reasoning, visual understanding, and diverse question formats, emphasizing conceptual understanding across all scientific subjects.

Outcomes: Offers detailed explanations for a majority of problems, facilitating better understanding and assessment of AI model capabilities in science education. Aims to improve reasoning capabilities in STEM by providing context-rich problems.

Challenges: Existing AI models demonstrate limitations in reasoning and understanding complex problems with accuracy levels ranging from 50% to 60%. Models show significant performance variability, particularly in tasks requiring precise calculations and image understanding, alongside limitations in knowledge affecting biological concepts.

Implementation Barriers

Technical Barrier

Current AI models show only 50% to 60% accuracy in solving benchmark problems, indicating a significant gap in reasoning capabilities. AI models also struggle with tasks requiring precise image interpretation and complex reasoning.

Proposed Solutions: Further research and development are necessary to enhance AI model performance and understanding. Incorporate advanced visual understanding tools and enhance datasets with diverse knowledge components.

Content Barrier

Existing benchmarks often overlook the high school to pre-college educational phase, resulting in a lack of appropriate assessment tools for this stage.

Proposed Solutions: Introduce targeted benchmarks like SceMQA to fill the existing gaps in educational assessment.

Implementation Barrier

AI models struggle with tasks requiring precise image interpretation and complex reasoning.

Proposed Solutions: Incorporate advanced visual understanding tools and enhance datasets with diverse knowledge components.

Project Team

Zhenwen Liang

Researcher

Kehan Guo

Researcher

Gang Liu

Researcher

Taicheng Guo

Researcher

Yujun Zhou

Researcher

Tianyu Yang

Researcher

Jiajun Jiao

Researcher

Renjie Pi

Researcher

Jipeng Zhang

Researcher

Xiangliang Zhang

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Zhenwen Liang, Kehan Guo, Gang Liu, Taicheng Guo, Yujun Zhou, Tianyu Yang, Jiajun Jiao, Renjie Pi, Jipeng Zhang, Xiangliang Zhang

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects