MULTI: Multimodal Understanding Leaderboard with Text and Images
Project Overview
The document outlines the role of generative AI in education, particularly through the introduction of MULTI, a multimodal benchmark that assesses the capabilities of Multimodal Large Language Models (MLLMs) in understanding and reasoning across diverse question types. It underscores the importance of aligning these models with real-world examination standards while identifying significant challenges in areas such as logical reasoning and image comprehension. The benchmark is designed to drive research and innovation in multimodal AI, revealing that current MLLMs still have considerable gaps in performance when compared to human experts. Additionally, the document explores how generative AI models are applied to enhance educational experiences by creating interactive and multimedia learning tools. It addresses the complexities involved in data processing and annotation, as well as the hurdles in effectively integrating generative AI into educational settings. Overall, the findings suggest that while generative AI holds promise for revolutionizing education, achieving optimal implementation requires overcoming existing limitations and enhancing model capabilities.
Key Applications
MULTI - Multimodal Understanding Benchmark and Dataset
Context: Designed for evaluating AI models in educational settings, targeting junior high, senior high school, and university-level assessments. This includes both benchmarking AI capabilities and constructing a dataset suitable for educational purposes for researchers and educators.
Implementation: The benchmark and dataset comprise over 18,000 questions derived from real-world educational sources, organized into various formats including multiple-choice, fill-in-the-blank, and open-ended questions. The implementation involves extensive data annotation and algorithmic selection to ensure diverse, challenging content for AI evaluation.
Outcomes: Enhanced training sets for AI models, improved quality of AI-generated educational content, better evaluation metrics for generative models, and insights into the capabilities of MLLMs compared to human expert baselines.
Challenges: Current MLLMs struggle with logical reasoning, mathematical computation, and image comprehension, particularly in complex tasks. The resource-intensive annotation process poses challenges in ensuring data quality and handling diverse content formats.
Implementation Barriers
Technical barrier
MLLMs show substantial gaps in performance compared to human experts, particularly in complex multimodal tasks. Additionally, high computational costs are associated with running advanced AI models, especially those with large parameter sizes.
Proposed Solutions: Continuous research and development, enhancing model alignment and reasoning capabilities, improving cross-modal understanding, and prioritizing testing on more efficient models like MULTI-Elite to reduce costs and speed up evaluation.
Data limitation
Existing benchmarks are often narrow in scope and may not adequately test MLLMs' comprehension or reasoning skills.
Proposed Solutions: Creating diverse and comprehensive datasets like MULTI that encompass a wide range of subjects and question types.
Data Quality Barrier
Ensuring the accuracy and relevance of the data used for training and evaluating AI models.
Proposed Solutions: Implement multiple rounds of annotation and automatic checking processes to enhance data credibility.
Project Team
Zichen Zhu
Researcher
Yang Xu
Researcher
Lu Chen
Researcher
Jingkai Yang
Researcher
Yichuan Ma
Researcher
Yiming Sun
Researcher
Hailin Wen
Researcher
Jiaqi Liu
Researcher
Jinyu Cai
Researcher
Yingzi Ma
Researcher
Situo Zhang
Researcher
Zihan Zhao
Researcher
Liangtai Sun
Researcher
Kai Yu
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Zichen Zhu, Yang Xu, Lu Chen, Jingkai Yang, Yichuan Ma, Yiming Sun, Hailin Wen, Jiaqi Liu, Jinyu Cai, Yingzi Ma, Situo Zhang, Zihan Zhao, Liangtai Sun, Kai Yu
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai