Skip to main content Skip to navigation

Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences

Project Overview

The document explores the role of generative AI and Natural Language Processing (NLP) in education, specifically focusing on the revitalization of endangered Indigenous languages in Brazil. It highlights the development of Large Language Models (LLMs) and Endangered Language Models (ELMs) as tools to support language documentation and preservation. Key applications include AI-powered writing assistants and translators designed collaboratively with Indigenous communities to meet their unique linguistic needs. The authors stress the importance of community engagement and ethical considerations in the technology development process, as well as the challenges involved in creating effective language tools. Ultimately, the initiative aims to empower Indigenous communities by providing them with AI resources that promote language sovereignty and sustainable use, thereby fostering a long-term commitment to language revitalization. The document underscores both the potential benefits and limitations of using AI for these purposes, emphasizing the significance of collaboration for successful outcomes.

Key Applications

Endangered Language Models and Writing Assistants

Context: Workshops with Indigenous communities in Brazil, specifically targeting young speakers and writers of endangered languages, such as Guarani Mbya and Nheengatu. These initiatives aim to engage students from Indigenous backgrounds in co-designing and testing AI-assisted writing tools that facilitate translation, spell-checking, and word prediction in their native languages.

Implementation: Developing and fine-tuning AI models to support writing in Indigenous languages through collaborative workshops. This includes creating prototypes for writing assistance and translation that are tested and refined based on user feedback from the community, emphasizing the co-design approach to ensure relevance and usability.

Outcomes: Increased interest among young people in using their native languages for writing; improved understanding of their linguistic heritage; development of usable prototypes for language documentation and support; and enhanced engagement with technology among Indigenous students.

Challenges: Limited data resources for fine-tuning models, ethical concerns regarding data use, building community trust, ensuring technology adoption and maintenance by Indigenous communities, and addressing varying literacy levels among users.

Implementation Barriers

Technical barrier

Difficulty in obtaining sufficient high-quality data for training language models due to low-resourced languages. The tools and technologies are still in their infancy and require ongoing development and testing.

Proposed Solutions: Use of fine-tuning techniques with small datasets, leveraging existing linguistic resources, ensuring high data quality through careful curation, and engaging with diverse communities to iterate on the design based on user feedback.

Ethical barrier

Distrust from Indigenous communities towards researchers and technology developers due to historical exploitation.

Proposed Solutions: Implementing community engagement strategies, ensuring transparency, obtaining informed consent, and prioritizing Indigenous data sovereignty.

Cultural/Social barrier

Historical appropriation of technology by non-Indigenous peoples has led to skepticism and challenges in adoption by Indigenous communities.

Proposed Solutions: Empower Indigenous communities through training in technology development and encourage ownership of the tools created.

Project Team

Claudio Pinhanez

Researcher

Paulo Cavalin

Researcher

Luciana Storto

Researcher

Thomas Finbow

Researcher

Alexander Cobbinah

Researcher

Julio Nogima

Researcher

Marisa Vasconcelos

Researcher

Pedro Domingues

Researcher

Priscila de Souza Mizukami

Researcher

Nicole Grell

Researcher

Majoí Gongora

Researcher

Isabel Gonçalves

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Claudio Pinhanez, Paulo Cavalin, Luciana Storto, Thomas Finbow, Alexander Cobbinah, Julio Nogima, Marisa Vasconcelos, Pedro Domingues, Priscila de Souza Mizukami, Nicole Grell, Majoí Gongora, Isabel Gonçalves

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies