Skip to main content Skip to navigation

MULTI: Multimodal Understanding Leaderboard with Text and Images

Project Overview

The document outlines the role of generative AI in education, particularly through the introduction of MULTI, a multimodal benchmark that assesses the capabilities of Multimodal Large Language Models (MLLMs) in understanding and reasoning across diverse question types. It underscores the importance of aligning these models with real-world examination standards while identifying significant challenges in areas such as logical reasoning and image comprehension. The benchmark is designed to drive research and innovation in multimodal AI, revealing that current MLLMs still have considerable gaps in performance when compared to human experts. Additionally, the document explores how generative AI models are applied to enhance educational experiences by creating interactive and multimedia learning tools. It addresses the complexities involved in data processing and annotation, as well as the hurdles in effectively integrating generative AI into educational settings. Overall, the findings suggest that while generative AI holds promise for revolutionizing education, achieving optimal implementation requires overcoming existing limitations and enhancing model capabilities.

Key Applications

MULTI - Multimodal Understanding Benchmark and Dataset

Context: Designed for evaluating AI models in educational settings, targeting junior high, senior high school, and university-level assessments. This includes both benchmarking AI capabilities and constructing a dataset suitable for educational purposes for researchers and educators.

Implementation: The benchmark and dataset comprise over 18,000 questions derived from real-world educational sources, organized into various formats including multiple-choice, fill-in-the-blank, and open-ended questions. The implementation involves extensive data annotation and algorithmic selection to ensure diverse, challenging content for AI evaluation.

Outcomes: Enhanced training sets for AI models, improved quality of AI-generated educational content, better evaluation metrics for generative models, and insights into the capabilities of MLLMs compared to human expert baselines.

Challenges: Current MLLMs struggle with logical reasoning, mathematical computation, and image comprehension, particularly in complex tasks. The resource-intensive annotation process poses challenges in ensuring data quality and handling diverse content formats.

Implementation Barriers

Technical barrier

MLLMs show substantial gaps in performance compared to human experts, particularly in complex multimodal tasks. Additionally, high computational costs are associated with running advanced AI models, especially those with large parameter sizes.

Proposed Solutions: Continuous research and development, enhancing model alignment and reasoning capabilities, improving cross-modal understanding, and prioritizing testing on more efficient models like MULTI-Elite to reduce costs and speed up evaluation.

Data limitation

Existing benchmarks are often narrow in scope and may not adequately test MLLMs' comprehension or reasoning skills.

Proposed Solutions: Creating diverse and comprehensive datasets like MULTI that encompass a wide range of subjects and question types.

Data Quality Barrier

Ensuring the accuracy and relevance of the data used for training and evaluating AI models.

Proposed Solutions: Implement multiple rounds of annotation and automatic checking processes to enhance data credibility.

Project Team

Zichen Zhu

Researcher

Yang Xu

Researcher

Lu Chen

Researcher

Jingkai Yang

Researcher

Yichuan Ma

Researcher

Yiming Sun

Researcher

Hailin Wen

Researcher

Jiaqi Liu

Researcher

Jinyu Cai

Researcher

Yingzi Ma

Researcher

Situo Zhang

Researcher

Zihan Zhao

Researcher

Liangtai Sun

Researcher

Kai Yu

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Zichen Zhu, Yang Xu, Lu Chen, Jingkai Yang, Yichuan Ma, Yiming Sun, Hailin Wen, Jiaqi Liu, Jinyu Cai, Yingzi Ma, Situo Zhang, Zihan Zhao, Liangtai Sun, Kai Yu

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies