Skip to main content Skip to navigation

MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning

Project Overview

The document explores the application of generative AI in education, particularly through the development of the MMCR dataset, which enhances visual language models (VLMs) for multi-turn dialogues that involve multiple images, reflecting real-world human-AI interactions. It underscores the critical role of contextual reasoning in dialogues, showcasing evaluation results that indicate models fine-tuned with the MMCR dataset outperform existing benchmarks. Additionally, the research reveals the 'Less is More' phenomenon in model training, highlighting that a balanced data distribution is more effective than simply increasing data volume. Overall, the findings suggest that integrating generative AI tools like VLMs into educational contexts can significantly improve interactions and engagement, ultimately leading to better learning outcomes.

Key Applications

MMCR (Multimodal Multi-turn Contextual Reasoning)

Context: Educational context for enhancing human-AI interaction in various disciplines, including Education, Science, and Humanities.

Implementation: Constructed a dataset (MMCR-310k) with multimodal dialogues and a diagnostic benchmark (MMCR-Bench) for training and evaluating VLMs.

Outcomes: Achieved improvements in contextual accuracy and performance on existing benchmarks by fine-tuning models with the MMCR dataset.

Challenges: Limitations in existing models' abilities to handle long-context dialogues and ensure logical consistency in multi-turn conversations.

Implementation Barriers

Data Quality

Existing benchmarks focus on single-image interactions, limiting the contextual reasoning capabilities of VLMs. This impacts the overall effectiveness of generative AI in educational settings.

Proposed Solutions: Developing comprehensive datasets like MMCR that incorporate multi-turn dialogues, strong contextual relevance, and logical progression.

Model Training

The 'Less is More' phenomenon suggests that increasing data volume does not always lead to performance improvement. Maintaining a balanced proportion of data across different task types is crucial for effective model training.

Proposed Solutions: Ensuring a diverse and balanced dataset during model training to optimize performance across various tasks.

Project Team

Dawei Yan

Researcher

Yang Li

Researcher

Qing-Guo Chen

Researcher

Weihua Luo

Researcher

Peng Wang

Researcher

Haokui Zhang

Researcher

Chunhua Shen

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Dawei Yan, Yang Li, Qing-Guo Chen, Weihua Luo, Peng Wang, Haokui Zhang, Chunhua Shen

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies