Skip to main content Skip to navigation

Towards a Unified Multimodal Reasoning Framework

Project Overview

The document explores the application of generative AI in education through the integration of Chain-of-Thought (CoT) reasoning and Visual Question Answering (VQA) techniques, specifically utilizing advanced language models like GPT-4. By leveraging datasets such as TextVQA and ScienceQA, the research highlights how these methodologies can enhance reasoning capabilities and improve the accuracy of language models in addressing multiple-choice questions. The findings suggest that combining CoT reasoning with VQA techniques can lead to significant advancements in multimodal AI systems, thereby fostering better educational outcomes. The document underscores the importance of continued research in this area to fully realize the potential of generative AI in supporting learning and assessment processes, ultimately aiming to enhance student engagement and comprehension through more effective AI-driven educational tools.

Key Applications

Integration of Chain-of-Thought (CoT) reasoning and Visual Question Answering (VQA)

Context: Educational context utilizing multimodal datasets for elementary and high school science curricula

Implementation: Employing three text embedding methods and three visual embedding approaches on TextVQA and ScienceQA datasets

Outcomes: Improved reasoning and question-answering capabilities of language models

Challenges: Limitations in computational resources and occasional poor performance from certain models

Implementation Barriers

Technical Limitations

Constraints with computational resources and issues with model performance (e.g., models returning poor results unexpectedly)

Proposed Solutions: Iterative refinement of the approach to address performance issues

Project Team

Abhinav Arun

Researcher

Dipendra Singh Mal

Researcher

Mehul Soni

Researcher

Tomohiro Sawada

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Abhinav Arun, Dipendra Singh Mal, Mehul Soni, Tomohiro Sawada

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies