Skip to main content Skip to navigation

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications

Project Overview

The document explores the role of generative AI, particularly Automatic Speech Recognition (ASR) technology, in enhancing education, with a specific emphasis on language learning for children, especially non-native speakers. It assesses the effectiveness of two ASR systems, Wav2Vec2.0 and Whisper AI, in analyzing child speech and offering feedback on pronunciation. While the systems faced challenges in accurately recognizing child and non-native speech, the findings indicate that these recent ASR models can deliver satisfactory performance, enabling educators to provide detailed insights into phoneme pronunciation and overall language proficiency. The analysis underscores the promising potential of ASR technology to inform the development of innovative educational applications aimed at improving language acquisition and supporting learners in their educational journeys.

Key Applications

Voicebots for language learning applications using ASR technology.

Context: Educational context for children learning a foreign language, specifically non-native Dutch speakers.

Implementation: Assessment of Wav2Vec2.0 and Whisper AI ASR systems on child speech, analyzing read and extemporaneous speech.

Outcomes: ASR systems provide feedback on pronunciation quality and fluency, with the ability to extract detailed measures of language proficiency.

Challenges: ASR systems perform poorly on child and non-native speech due to variability in speech characteristics and limited speech data.

Implementation Barriers

Technical Barrier

ASR systems struggle with the speech characteristics of children and non-native speakers due to high variability and physiological differences, as well as limited availability of speech data for these groups.

Proposed Solutions: Utilization of state-of-the-art ASR models like Wav2Vec2.0 and Whisper with further fine-tuning on child speech data. Additionally, collecting diverse speech data and obtaining permissions for data collection from schools and parents can help overcome these challenges.

Project Team

Simone Wills

Researcher

Yu Bai

Researcher

Cristian Tejedor-Garcia

Researcher

Catia Cucchiarini

Researcher

Helmer Strik

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Simone Wills, Yu Bai, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies