Skip to main content Skip to navigation

How Ready Are Generative Pre-trained Large Language Models for Explaining Bengali Grammatical Errors?

Project Overview

The document explores the application of generative AI, particularly large language models (LLMs), in the realm of education, focusing on grammatical error correction (GEC) for the Bengali language. It underscores the importance of developing grammatical error explanation (GEE) systems that not only correct language errors but also offer insightful explanations, a crucial aspect for effective language learning. The research assesses the performance of various LLMs in comparison to human experts, highlighting notable shortcomings in the models' ability to deliver meaningful explanations related to Bengali grammar. These findings reveal that while LLMs can assist in error correction, they lack the depth necessary for comprehensive learning, thus indicating a persistent need for human intervention in the educational process to enhance the effectiveness of AI-driven tools.

Key Applications

Grammatical Error Explanation (GEE) system for Bengali

Context: Language learning for Bengali speakers of varying proficiency levels

Implementation: The GEE system uses a two-step pipeline where erroneous sentences are corrected and explanations for the corrections are provided using generative AI models like GPT-4 Turbo and Llama.

Outcomes: The system aims to enhance understanding of grammatical rules and improve proficiency in the Bengali language.

Challenges: LLMs struggle to provide context-aware and meaningful explanations, often failing in nuanced grammatical aspects.

Implementation Barriers

Technical limitations

Current LLMs exhibit limitations in generating accurate grammatical explanations, particularly for complex structures in Bengali. Incorporating human intervention and manual checks for feedback quality can improve the performance of GEC tools.

Proposed Solutions: Incorporating human intervention and manual checks for feedback quality to improve the performance of GEC tools.

Data scarcity

Limited availability of high-quality datasets for grammatical error correction in low-resource languages like Bengali. The creation of real-world multi-domain datasets is necessary to serve as evaluation benchmarks for GEE systems.

Proposed Solutions: The creation of real-world multi-domain datasets to serve as evaluation benchmarks for GEE systems.

Project Team

Subhankar Maity

Researcher

Aniket Deroy

Researcher

Sudeshna Sarkar

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Subhankar Maity, Aniket Deroy, Sudeshna Sarkar

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies