uTalk: Bridging the Gap Between Humans and AI
Project Overview
The document explores the innovative application of generative AI in education through the uTalk system, which combines Large Language Models (LLMs) like ChatGPT with advanced visual models to improve Human-Computer Interaction (HCI). This interactive framework enables users to converse with digital avatars, facilitating content creation and enhancing engagement in educational settings. By incorporating technologies such as Whisper for speech recognition and SadTalker for producing realistic video avatars, uTalk aims to create a seamless and intuitive user experience across various educational platforms, chatbots, and personal assistants. The key findings highlight the system's potential to optimize runtime efficiency while fostering a more interactive and personalized learning environment. Overall, the integration of generative AI through uTalk demonstrates significant advancements in how technology can enrich educational experiences, making learning more accessible and engaging for users.
Key Applications
uTalk - an interactive virtual avatar system for conversation and content generation.
Context: Educational platforms, chatbots, and personal assistants.
Implementation: The system integrates Whisper API for speech recognition, ChatGPT for generating responses, and SadTalker for producing talking head videos. It is hosted on Streamlit, allowing user interaction through audio or text inputs.
Outcomes: Enhanced user experience with realistic digital twins, improved run-time performance by 27.69%, and ability to generate videos for educational content.
Challenges: Technical bottlenecks in video generation and optimization of run-time.
Implementation Barriers
Technical barrier
Bottlenecks in the performance of the SadTalker system, which affects video generation speed.
Proposed Solutions: Optimization of code, removal of redundant processes, and parallelization of operations within the framework.
Project Team
Hussam Azzuni
Researcher
Sharim Jamal
Researcher
Abdulmotaleb Elsaddik
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Hussam Azzuni, Sharim Jamal, Abdulmotaleb Elsaddik
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai