Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences
Project Overview
The document explores the role of generative AI and Natural Language Processing (NLP) in education, specifically focusing on the revitalization of endangered Indigenous languages in Brazil. It highlights the development of Large Language Models (LLMs) and Endangered Language Models (ELMs) as tools to support language documentation and preservation. Key applications include AI-powered writing assistants and translators designed collaboratively with Indigenous communities to meet their unique linguistic needs. The authors stress the importance of community engagement and ethical considerations in the technology development process, as well as the challenges involved in creating effective language tools. Ultimately, the initiative aims to empower Indigenous communities by providing them with AI resources that promote language sovereignty and sustainable use, thereby fostering a long-term commitment to language revitalization. The document underscores both the potential benefits and limitations of using AI for these purposes, emphasizing the significance of collaboration for successful outcomes.
Key Applications
Endangered Language Models and Writing Assistants
Context: Workshops with Indigenous communities in Brazil, specifically targeting young speakers and writers of endangered languages, such as Guarani Mbya and Nheengatu. These initiatives aim to engage students from Indigenous backgrounds in co-designing and testing AI-assisted writing tools that facilitate translation, spell-checking, and word prediction in their native languages.
Implementation: Developing and fine-tuning AI models to support writing in Indigenous languages through collaborative workshops. This includes creating prototypes for writing assistance and translation that are tested and refined based on user feedback from the community, emphasizing the co-design approach to ensure relevance and usability.
Outcomes: Increased interest among young people in using their native languages for writing; improved understanding of their linguistic heritage; development of usable prototypes for language documentation and support; and enhanced engagement with technology among Indigenous students.
Challenges: Limited data resources for fine-tuning models, ethical concerns regarding data use, building community trust, ensuring technology adoption and maintenance by Indigenous communities, and addressing varying literacy levels among users.
Implementation Barriers
Technical barrier
Difficulty in obtaining sufficient high-quality data for training language models due to low-resourced languages. The tools and technologies are still in their infancy and require ongoing development and testing.
Proposed Solutions: Use of fine-tuning techniques with small datasets, leveraging existing linguistic resources, ensuring high data quality through careful curation, and engaging with diverse communities to iterate on the design based on user feedback.
Ethical barrier
Distrust from Indigenous communities towards researchers and technology developers due to historical exploitation.
Proposed Solutions: Implementing community engagement strategies, ensuring transparency, obtaining informed consent, and prioritizing Indigenous data sovereignty.
Cultural/Social barrier
Historical appropriation of technology by non-Indigenous peoples has led to skepticism and challenges in adoption by Indigenous communities.
Proposed Solutions: Empower Indigenous communities through training in technology development and encourage ownership of the tools created.
Project Team
Claudio Pinhanez
Researcher
Paulo Cavalin
Researcher
Luciana Storto
Researcher
Thomas Finbow
Researcher
Alexander Cobbinah
Researcher
Julio Nogima
Researcher
Marisa Vasconcelos
Researcher
Pedro Domingues
Researcher
Priscila de Souza Mizukami
Researcher
Nicole Grell
Researcher
Majoí Gongora
Researcher
Isabel Gonçalves
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Claudio Pinhanez, Paulo Cavalin, Luciana Storto, Thomas Finbow, Alexander Cobbinah, Julio Nogima, Marisa Vasconcelos, Pedro Domingues, Priscila de Souza Mizukami, Nicole Grell, Majoí Gongora, Isabel Gonçalves
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai