Skip to main content Skip to navigation

ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs

Project Overview

The document explores the application of generative AI in education, particularly focusing on the development of machine translation (MT) and automatic speech recognition (ASR) systems tailored for code-switched Egyptian Arabic-English using large language models (LLMs). It underscores the significance of accounting for cultural nuances in translation, addressing the challenges of limited resources and the intricacies of code-switching. The authors detail their methodologies and experimental outcomes, demonstrating notable advancements in both translation and speech recognition performance. This highlights the potential of generative AI to enhance language learning and communication in a bilingual context, ultimately improving educational tools and resources for diverse linguistic populations. The findings suggest that leveraging generative AI can lead to more effective and culturally sensitive educational applications, bridging gaps in language barriers and fostering better understanding in multilingual environments.

Key Applications

ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition

Context: Educational context focused on bilingual university students and professionals in Egypt needing effective communication tools.

Implementation: Developed using LLMs like LLaMa and Gemma, with an ASR system leveraging the Whisper model, pre-trained on a specific dataset.

Outcomes: Achieved a 56% improvement in English translation and a 9.3% improvement in Arabic translation over state-of-the-art methods.

Challenges: Limited resources and the unique characteristics of the Egyptian Arabic dialect make model training complex.

Implementation Barriers

Resource Scarcity

Limited resources dedicated to training models on code-switched data hinder development.

Proposed Solutions: Open-sourcing models and data to encourage community engagement and further research.

Cultural Nuances

Machine translation struggles to capture cultural differences and nuances between languages.

Proposed Solutions: Utilizing LLMs trained on diverse datasets to enhance cultural understanding in translations.

Project Team

Ahmed Heakl

Researcher

Youssef Zaghloul

Researcher

Mennatullah Ali

Researcher

Rania Hossam

Researcher

Walid Gomaa

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Ahmed Heakl, Youssef Zaghloul, Mennatullah Ali, Rania Hossam, Walid Gomaa

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies