Skip to main content Skip to navigation

CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments

Project Overview

This document explores the application of generative AI in education, specifically focusing on the development of Automatic Speech Recognition (ASR) systems optimized for classroom environments. Highlighting the challenges of recognizing children's speech amidst background noise and varying articulation, it details the use of continued pretraining (CPT) on the Wav2vec2.0 model to enhance performance in these settings. Experimental results are presented, illustrating that CPT significantly boosts the robustness and accuracy of ASR systems in classrooms, which can ultimately facilitate better communication and learning outcomes for students. The findings indicate that improved ASR technology can serve as a valuable tool for educators, enabling more effective teaching strategies and potentially transforming the learning experience in noisy educational environments. Overall, the document underscores the promise of generative AI in enhancing educational tools and methodologies.

Key Applications

Wav2vec2.0 with Continued Pretraining (CPT)

Context: Classroom environments, targeting educators and students, especially in elementary education.

Implementation: CPT was applied to adapt the Wav2vec2.0 model using untranscribed noisy classroom data followed by fine-tuning on small labeled datasets.

Outcomes: CPT improved the Word Error Rate (WER) by up to 12.26% on average and up to 27% in specific noisy classroom conditions, making the model more effective in recognizing children's speech.

Challenges: Challenges include the scarcity of transcribed classroom datasets, the need for robustness against various noise types, and the distinct characteristics of children's speech.

Implementation Barriers

Data scarcity

There is a lack of transcribed classroom datasets due to privacy concerns and high transcription costs.

Proposed Solutions: Proposed solutions include leveraging untranscribed data for self-supervised learning and developing tools to ensure balanced demographic representation in future datasets.

Noise robustness

ASR systems struggle with background noise, particularly in classroom settings where children's babble can interfere with speech recognition.

Proposed Solutions: Utilizing CPT to enhance model robustness to various noise conditions and microphone configurations.

Project Team

Ahmed Adel Attia

Researcher

Dorottya Demszky

Researcher

Tolulope Ogunremi

Researcher

Jing Liu

Researcher

Carol Espy-Wilson

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, Carol Espy-Wilson

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies