The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives
Project Overview
The document outlines an innovative educational tool that utilizes Generative Artificial Intelligence (GenAI) to enrich storytelling experiences for children. By integrating advanced technologies such as Large Language Models (LLMs), Text-to-Speech (TTS), Text-to-Video (TTV), and Text-to-Music (TTM), the tool creates engaging, multimodal narratives that cater to diverse learning styles. This system enhances children's creative expression and supports cognitive development by actively involving them in the storytelling process. The findings indicate that such dynamic storytelling not only captivates young learners but also fosters critical thinking and imaginative skills, ultimately contributing to a more interactive and effective educational environment. The implementation of GenAI in this context highlights its potential to transform traditional learning methods and enrich the educational landscape.
Key Applications
Multi-agent system for storytelling
Context: Educational tool for children aged 7 to 12, focusing on storytelling and narrative creation.
Implementation: Utilizes LLMs for story generation, TTS for narration, TTV for animation, and TTM for background music. The system involves user prompts for story development and has multiple agents for different tasks.
Outcomes: Enhances reading comprehension, supports creative expression, improves retention and engagement through multimodal stimuli.
Challenges: Visual distortions in animations, high generation time for videos, need for a more comprehensive dataset for child-appropriateness.
Implementation Barriers
Technical
Visual inconsistencies in generated animations detract from the storytelling experience.
Proposed Solutions: Develop more robust models for animation generation and optimize rendering times.
Content Appropriateness
Ensuring that generated content is suitable for children and free from inappropriate material.
Proposed Solutions: Implement rigorous content moderation using LLMs and human review to filter out inappropriate stories.
Resource Limitations
High generation time for animations can hinder the user experience.
Proposed Solutions: Optimize the rendering pipeline and consider using more efficient algorithms for video generation.
Project Team
Samee Arif
Researcher
Taimoor Arif
Researcher
Muhammad Saad Haroon
Researcher
Aamina Jamal Khan
Researcher
Agha Ali Raza
Researcher
Awais Athar
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Samee Arif, Taimoor Arif, Muhammad Saad Haroon, Aamina Jamal Khan, Agha Ali Raza, Awais Athar
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai