Deciphering Emotions in Children Storybooks: A Comparative Analysis of Multimodal LLMs in Educational Applications
Project Overview
The document examines the effectiveness of two multimodal large language models, GPT-4o and Gemini 1.5 Pro, in recognizing emotions within Arabic children's storybook illustrations, highlighting their application in educational settings. It demonstrates that GPT-4o surpassed Gemini in emotion recognition, especially when utilizing chain-of-thought prompting techniques. However, the study uncovers challenges related to systematic misclassification, particularly with culturally nuanced emotions, underscoring the necessity for culturally sensitive training approaches in developing educational technologies. The findings emphasize the importance of creating culturally responsive AI tools that cater to the needs of Arabic-speaking learners, thereby enhancing the educational experience and fostering a deeper emotional connection through tailored content. Overall, the document underscores the potential of generative AI in education while advocating for attention to cultural context in AI training to improve outcomes for diverse student populations.
Key Applications
Emotion recognition in multimodal large language models (GPT-4o and Gemini 1.5 Pro)
Context: Arabic children's literature and educational technology
Implementation: Evaluation of emotion recognition capabilities using prompting strategies (zero-shot, few-shot, chain-of-thought) on 75 images from Arabic storybooks
Outcomes: GPT-4o achieved a macro F1-score of 59% using chain-of-thought prompting, while Gemini scored 43%. The study provided insights into error patterns and cultural sensitivities in emotion recognition.
Challenges: Models struggled with culturally nuanced emotions, particularly in Arabic contexts, and exhibited systematic misclassification patterns, particularly with valence inversions.
Implementation Barriers
Cultural Sensitivity
Current models lack adequate cultural understanding, leading to misinterpretation of emotions in Arabic contexts.
Proposed Solutions: Development of culturally sensitive training datasets and emotion recognition systems tailored for Arabic literacy education.
Data Scarcity
Limited annotated Arabic datasets hinder the development of effective emotion recognition systems.
Proposed Solutions: Initiatives like ArPanEmo to provide labeled Arabic content for multiple emotion categories.
Technical Limitations
Models displayed systematic misclassification patterns, particularly with nuanced and complex emotions.
Proposed Solutions: Enhanced prompting techniques and model architectures that incorporate cultural nuances and emotional complexity.
Project Team
Bushra Asseri
Researcher
Estabraq Abdelaziz
Researcher
Maha Al Mogren
Researcher
Tayef Alhefdhi
Researcher
Areej Al-Wabil
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Bushra Asseri, Estabraq Abdelaziz, Maha Al Mogren, Tayef Alhefdhi, Areej Al-Wabil
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai