Generative AI for Named Entity Recognition in Low-Resource Language Nepali
Project Overview
The document discusses the transformative role of generative AI, particularly Large Language Models (LLMs), in enhancing educational applications, with a focus on Named Entity Recognition (NER) for low-resource languages like Nepali. It evaluates the effectiveness of LLMs in identifying named entities such as people, locations, and organizations within Nepali text. The findings reveal that although traditional methods still achieve higher precision, LLMs demonstrate a strong ability to learn and generalize from limited data, indicating their potential in resource-constrained environments. Additionally, the paper highlights the adaptability of LLMs through advanced prompting and self-verification techniques, suggesting that these approaches could further improve NER tasks in educational contexts. Overall, the document underscores the potential of generative AI to enhance language processing capabilities in education, particularly for languages that lack extensive resources, thereby contributing to more inclusive and effective learning experiences.
Key Applications
Named Entity Recognition (NER) using Generative AI and LLMs
Context: Application in the Nepali language for NER tasks, targeting NLP researchers and practitioners working with low-resource languages.
Implementation: Benchmarking state-of-the-art LLMs for NER, employing various prompting techniques and self-verification mechanisms.
Outcomes: Improved NER performance in Nepali, with findings showing the potential of LLMs to effectively recognize entities despite limited training data.
Challenges: Performance of LLMs is still below traditional supervised NER systems, and the models can struggle with precision, especially in distinguishing between entity types.
Implementation Barriers
Performance Barrier
The performance of Generative AI models is generally lower than traditional supervised models, particularly in precision. Additionally, LLMs perform better with English prompts than Nepali prompts, highlighting the need for better multilingual capabilities.
Proposed Solutions: Improving the prompting techniques, conducting fine-tuning, enhancing the training data for LLMs, and implementing dedicated pre-training or fine-tuning of models specifically for low-resource languages.
Data Availability Barrier
Low-resource languages like Nepali have limited labeled training data, making it challenging for models to perform optimally.
Proposed Solutions: Using few-shot learning and data augmentation techniques to maximize the effectiveness of limited data.
Project Team
Sameer Neupane
Researcher
Jeevan Chapagain
Researcher
Nobal B. Niraula
Researcher
Diwa Koirala
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Sameer Neupane, Jeevan Chapagain, Nobal B. Niraula, Diwa Koirala
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai