Skip to main content Skip to navigation

Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering

Project Overview

The document explores the integration of Large Language Models (LLMs) in education, particularly focusing on their application in enhancing social intelligence for video question-answering tasks. It introduces the Looped Video Debating (LVD) framework, which leverages LLM capabilities alongside visual information to improve the accuracy of responses without requiring extensive fine-tuning. The findings underscore the challenges in creating AI systems that can effectively navigate nuanced human interactions, yet they reveal the promising potential of LLMs to comprehend social contexts through multimodal data analysis. This approach not only enhances educational outcomes by facilitating deeper understanding and engagement but also indicates a broader applicability of generative AI technologies in fields such as healthcare and caregiving. Overall, the document highlights the transformative role of generative AI in education, showcasing its ability to combine language understanding with visual data to foster improved learning experiences.

Key Applications

Looped Video Debating (LVD)

Context: Video question answering in educational settings, particularly for understanding human interactions.

Implementation: LVD combines LLMs with visual question answering models to enhance the accuracy of responses based on human interaction videos.

Outcomes: Achieved state-of-the-art performance on the Social-IQ 2.0 benchmark without fine-tuning; improved transparency and reliability in AI responses.

Challenges: Difficulty in integrating multiple modalities (vision and speech) and ensuring the model's understanding of complex human interactions.

Implementation Barriers

Technical

Challenges in developing AI that effectively utilizes multiple modalities for seamless human communication.

Proposed Solutions: Utilizing frameworks like LVD that leverage LLM capabilities and visual information to improve AI understanding.

Project Team

Erika Mori

Researcher

Yue Qiu

Researcher

Hirokatsu Kataoka

Researcher

Yoshimitsu Aoki

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Erika Mori, Yue Qiu, Hirokatsu Kataoka, Yoshimitsu Aoki

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies