Simulating User Agents for Embodied Conversational-AI

Project Overview

The document explores the application of a large language model (LLM)-based user agent designed to simulate interactions with embodied conversational AI robots, focusing on its role in generating dialogue datasets for training purposes. By facilitating natural language communication, this approach not only enhances the robots' task completion capabilities but also showcases the potential of LLMs to produce human-like dialogue behavior. Experimental results indicate that the model effectively predicts user actions and dialogue acts with high accuracy, underscoring its effectiveness in user simulations. The findings point to the scalability and efficiency of LLMs in improving embodied AI interactions, highlighting their significance in advancing research in human-robot communication and interaction. Overall, the use of generative AI in this context reveals promising avenues for enhancing educational tools and methodologies, particularly in fostering more intuitive and effective human-computer collaboration in learning environments.

Key Applications

LLM-based user agent for simulating user behavior in human-robot interactions

Context: Virtual environments for training and evaluating embodied conversational AI robots, targeting researchers and developers in AI and robotics.

Implementation: Utilized LLMs with zero-shot and few-shot learning techniques to predict user actions and responses during task-oriented dialogues.

Outcomes: Achieved F-measure scores of 43.4% for predicting when to speak and 51.1% for dialogue act prediction, demonstrating the model's effectiveness in simulating human-like interactions.

Challenges: Complexity in accurately predicting user actions due to the presence of non-verbal move actions, leading to difficulties in discerning when to speak versus observe.

Implementation Barriers

Technical

The presence of non-verbal actions (move actions) introduces noise that confuses the model's ability to predict user speech accurately.

Proposed Solutions: Selective removal of move actions or refining the dataset to minimize the impact of such actions on dialogue predictions.

Data Collection

Collecting large-scale, diverse datasets of situated human-robot dialogues for training is expensive and labor-intensive.

Proposed Solutions: Use LLM-based user agents to simulate user behavior, thereby reducing the need for extensive real-world data collection.

Project Team

Daniel Philipov

Researcher

Vardhan Dongre

Researcher

Gokhan Tur

Researcher

Dilek Hakkani-Tür

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Daniel Philipov, Vardhan Dongre, Gokhan Tur, Dilek Hakkani-Tür

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects