Cross-Data Knowledge Graph Construction for LLM-enabled Educational Question-Answering System: A Case Study at HCMUT
Project Overview
The document explores the application of generative AI in education through a case study at Ho Chi Minh City University of Technology, focusing on the development of a Knowledge Graph (KG) for an educational Question-Answering (QA) system utilizing Large Language Models (LLMs). It addresses the challenges associated with integrating varied educational data sources and presents a framework for open intent and relation discovery using embedding techniques. Findings from the research reveal that the KG-augmented LLM approach significantly enhances the accuracy and relevance of responses in educational settings, while also acknowledging obstacles such as data complexity and language-specific nuances. Overall, the study underscores the transformative potential of generative AI technologies in educational environments, particularly in improving interactive learning experiences and facilitating access to information.
Key Applications
Knowledge Graph-based Educational Question-Answering System
Context: Higher education at HCMUT, targeting students and academic staff seeking information via a QA system.
Implementation: Developed a framework for constructing a KG from various educational data sources and implemented it in conjunction with LLMs for question-answering tasks.
Outcomes: Enhanced accuracy and relevance of responses in educational queries through KG integration with LLMs. Demonstrated successful extraction of intents and relationships from data.
Challenges: Complexity of data integration from multiple sources, issues with open intent recognition, and language-specific challenges in processing Vietnamese.
Implementation Barriers
Technical
Challenges in integrating diverse data sources with different structures and formats, making it difficult to create a cohesive Knowledge Graph.
Proposed Solutions: Proposed the E-OED Framework to facilitate open intent discovery and relation discovery from multiple educational data sources.
Linguistic
The Vietnamese language presents specific challenges due to its status as a low-resource language, affecting NLP tool performance.
Proposed Solutions: Use of specialized preprocessing techniques and embedding models tailored for Vietnamese to improve performance in intent discovery.
Data Quality
The presence of overlapping and repetitive clusters in data, along with the complexity of intent discovery, complicates the overall process.
Proposed Solutions: Implementing an automatic cluster labeling method and refining clustering algorithms to enhance accuracy.
Project Team
Tuan Bui
Researcher
Oanh Tran
Researcher
Phuong Nguyen
Researcher
Bao Ho
Researcher
Long Nguyen
Researcher
Thang Bui
Researcher
Tho Quan
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Tuan Bui, Oanh Tran, Phuong Nguyen, Bao Ho, Long Nguyen, Thang Bui, Tho Quan
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai