Skip to main content Skip to navigation

A Deep Reinforcement Learning Chatbot (Short Version)

Project Overview

The document explores the application of generative AI in education through the development of MILABOT, a deep reinforcement learning chatbot designed for the Amazon Alexa Prize competition. It highlights how MILABOT utilizes an ensemble of natural language generation and retrieval models, augmented by reinforcement learning techniques to enhance user interaction. The evaluation of MILABOT reveals that it significantly surpasses other chatbots in terms of user satisfaction and conversation duration, demonstrating the potential of integrating diverse AI models and learning methodologies to engage users effectively. This study illustrates the promising role of generative AI in educational contexts, particularly in creating interactive and responsive learning environments that can adapt to user needs and improve overall learning experiences. The findings suggest that leveraging advanced AI technologies can lead to more meaningful and effective educational tools, ultimately enhancing student engagement and learning outcomes.

Key Applications

MILABOT - a deep reinforcement learning chatbot

Context: Developed for the Amazon Alexa Prize competition, targeting users interacting with the Alexa voice assistant.

Implementation: Utilized an ensemble of natural language generation and retrieval models, applying deep reinforcement learning principles through A/B testing with real-world users.

Outcomes: Achieved an average user score of 3.15, significantly higher than the competition average of 2.92, and averaged 14.5 turns per conversation, indicating higher engagement.

Challenges: Challenges included variations in user distribution and changing user expectations over time, which could affect A/B testing results.

Implementation Barriers

User Expectation Management

User expectations towards conversational agents can change based on their interactions with other systems, impacting their satisfaction with the chatbot.

Proposed Solutions: Conduct A/B testing during consistent time periods to minimize user expectation variations and ensure equal user distribution across tested policies.

Project Team

Iulian V. Serban

Researcher

Chinnadhurai Sankar

Researcher

Mathieu Germain

Researcher

Saizheng Zhang

Researcher

Zhouhan Lin

Researcher

Sandeep Subramanian

Researcher

Taesup Kim

Researcher

Michael Pieper

Researcher

Sarath Chandar

Researcher

Nan Rosemary Ke

Researcher

Sai Rajeswar

Researcher

Alexandre de Brebisson

Researcher

Jose M. R. Sotelo

Researcher

Dendi Suhubdy

Researcher

Vincent Michalski

Researcher

Alexandre Nguyen

Researcher

Joelle Pineau

Researcher

Yoshua Bengio

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeswar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies