My Science Tutor (MyST) -- A Large Corpus of Children's Conversational Speech
Project Overview
The My Science Tutor (MyST) project exemplifies the application of generative AI in education by creating a vast corpus of 400 hours of children's conversational speech to enhance science learning through improved educational tools. This corpus, derived from 1,371 elementary students, focuses on advancing automatic speech recognition and developing conversational AI agents that can effectively engage young learners. The project has shown significant learning gains among students who interacted with a virtual tutor, highlighting the potential of generative AI to foster enthusiasm for science and improve educational outcomes, particularly in remote learning environments. These findings suggest that integrating conversational AI into educational settings can facilitate personalized learning experiences, making complex subjects more accessible and enjoyable for students. Overall, the MyST initiative underscores the transformative role of generative AI in reshaping educational practices and enhancing student learning experiences.
Key Applications
My Science Tutor (MyST) corpus
Context: Elementary school science education for 3rd to 5th grade students
Implementation: Students engaged in spoken dialog with a virtual science tutor, Marni, over 10,496 sessions, covering various science topics aligned with classroom curriculum.
Outcomes: Significant learning gains equivalent to human tutoring, increased student enthusiasm for science, and feasibility of integration into existing curricula.
Challenges: Requires large corpus of children's speech data, potential issues with data privacy and consent.
Implementation Barriers
Data Access
Difficulty in accessing large, specific datasets of children's speech needed for training educational AI models.
Proposed Solutions: Creation of the MyST corpus to provide a substantial and freely available collection of children's conversational speech.
Privacy and Consent
Need for parental and student consent for the use of children's speech data in research, along with the use of anonymized data collection methods.
Proposed Solutions: Institutional Review Board approval and implementation of anonymized data collection methods to protect student privacy.
Project Team
Sameer S. Pradhan
Researcher
Ronald A. Cole
Researcher
Wayne H. Ward
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Sameer S. Pradhan, Ronald A. Cole, Wayne H. Ward
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai