Skip to main content Skip to navigation

Deciphering Emotions in Children Storybooks: A Comparative Analysis of Multimodal LLMs in Educational Applications

Project Overview

The document examines the effectiveness of two multimodal large language models, GPT-4o and Gemini 1.5 Pro, in recognizing emotions within Arabic children's storybook illustrations, highlighting their application in educational settings. It demonstrates that GPT-4o surpassed Gemini in emotion recognition, especially when utilizing chain-of-thought prompting techniques. However, the study uncovers challenges related to systematic misclassification, particularly with culturally nuanced emotions, underscoring the necessity for culturally sensitive training approaches in developing educational technologies. The findings emphasize the importance of creating culturally responsive AI tools that cater to the needs of Arabic-speaking learners, thereby enhancing the educational experience and fostering a deeper emotional connection through tailored content. Overall, the document underscores the potential of generative AI in education while advocating for attention to cultural context in AI training to improve outcomes for diverse student populations.

Key Applications

Emotion recognition in multimodal large language models (GPT-4o and Gemini 1.5 Pro)

Context: Arabic children's literature and educational technology

Implementation: Evaluation of emotion recognition capabilities using prompting strategies (zero-shot, few-shot, chain-of-thought) on 75 images from Arabic storybooks

Outcomes: GPT-4o achieved a macro F1-score of 59% using chain-of-thought prompting, while Gemini scored 43%. The study provided insights into error patterns and cultural sensitivities in emotion recognition.

Challenges: Models struggled with culturally nuanced emotions, particularly in Arabic contexts, and exhibited systematic misclassification patterns, particularly with valence inversions.

Implementation Barriers

Cultural Sensitivity

Current models lack adequate cultural understanding, leading to misinterpretation of emotions in Arabic contexts.

Proposed Solutions: Development of culturally sensitive training datasets and emotion recognition systems tailored for Arabic literacy education.

Data Scarcity

Limited annotated Arabic datasets hinder the development of effective emotion recognition systems.

Proposed Solutions: Initiatives like ArPanEmo to provide labeled Arabic content for multiple emotion categories.

Technical Limitations

Models displayed systematic misclassification patterns, particularly with nuanced and complex emotions.

Proposed Solutions: Enhanced prompting techniques and model architectures that incorporate cultural nuances and emotional complexity.

Project Team

Bushra Asseri

Researcher

Estabraq Abdelaziz

Researcher

Maha Al Mogren

Researcher

Tayef Alhefdhi

Researcher

Areej Al-Wabil

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Bushra Asseri, Estabraq Abdelaziz, Maha Al Mogren, Tayef Alhefdhi, Areej Al-Wabil

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies