Skip to main content Skip to navigation

My Science Tutor (MyST) -- A Large Corpus of Children's Conversational Speech

Project Overview

The My Science Tutor (MyST) project exemplifies the application of generative AI in education by creating a vast corpus of 400 hours of children's conversational speech to enhance science learning through improved educational tools. This corpus, derived from 1,371 elementary students, focuses on advancing automatic speech recognition and developing conversational AI agents that can effectively engage young learners. The project has shown significant learning gains among students who interacted with a virtual tutor, highlighting the potential of generative AI to foster enthusiasm for science and improve educational outcomes, particularly in remote learning environments. These findings suggest that integrating conversational AI into educational settings can facilitate personalized learning experiences, making complex subjects more accessible and enjoyable for students. Overall, the MyST initiative underscores the transformative role of generative AI in reshaping educational practices and enhancing student learning experiences.

Key Applications

My Science Tutor (MyST) corpus

Context: Elementary school science education for 3rd to 5th grade students

Implementation: Students engaged in spoken dialog with a virtual science tutor, Marni, over 10,496 sessions, covering various science topics aligned with classroom curriculum.

Outcomes: Significant learning gains equivalent to human tutoring, increased student enthusiasm for science, and feasibility of integration into existing curricula.

Challenges: Requires large corpus of children's speech data, potential issues with data privacy and consent.

Implementation Barriers

Data Access

Difficulty in accessing large, specific datasets of children's speech needed for training educational AI models.

Proposed Solutions: Creation of the MyST corpus to provide a substantial and freely available collection of children's conversational speech.

Privacy and Consent

Need for parental and student consent for the use of children's speech data in research, along with the use of anonymized data collection methods.

Proposed Solutions: Institutional Review Board approval and implementation of anonymized data collection methods to protect student privacy.

Project Team

Sameer S. Pradhan

Researcher

Ronald A. Cole

Researcher

Wayne H. Ward

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Sameer S. Pradhan, Ronald A. Cole, Wayne H. Ward

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies