Skip to main content Skip to navigation

Cross-Data Knowledge Graph Construction for LLM-enabled Educational Question-Answering System: A Case Study at HCMUT

Project Overview

The document explores the application of generative AI in education through a case study at Ho Chi Minh City University of Technology, focusing on the development of a Knowledge Graph (KG) for an educational Question-Answering (QA) system utilizing Large Language Models (LLMs). It addresses the challenges associated with integrating varied educational data sources and presents a framework for open intent and relation discovery using embedding techniques. Findings from the research reveal that the KG-augmented LLM approach significantly enhances the accuracy and relevance of responses in educational settings, while also acknowledging obstacles such as data complexity and language-specific nuances. Overall, the study underscores the transformative potential of generative AI technologies in educational environments, particularly in improving interactive learning experiences and facilitating access to information.

Key Applications

Knowledge Graph-based Educational Question-Answering System

Context: Higher education at HCMUT, targeting students and academic staff seeking information via a QA system.

Implementation: Developed a framework for constructing a KG from various educational data sources and implemented it in conjunction with LLMs for question-answering tasks.

Outcomes: Enhanced accuracy and relevance of responses in educational queries through KG integration with LLMs. Demonstrated successful extraction of intents and relationships from data.

Challenges: Complexity of data integration from multiple sources, issues with open intent recognition, and language-specific challenges in processing Vietnamese.

Implementation Barriers

Technical

Challenges in integrating diverse data sources with different structures and formats, making it difficult to create a cohesive Knowledge Graph.

Proposed Solutions: Proposed the E-OED Framework to facilitate open intent discovery and relation discovery from multiple educational data sources.

Linguistic

The Vietnamese language presents specific challenges due to its status as a low-resource language, affecting NLP tool performance.

Proposed Solutions: Use of specialized preprocessing techniques and embedding models tailored for Vietnamese to improve performance in intent discovery.

Data Quality

The presence of overlapping and repetitive clusters in data, along with the complexity of intent discovery, complicates the overall process.

Proposed Solutions: Implementing an automatic cluster labeling method and refining clustering algorithms to enhance accuracy.

Project Team

Tuan Bui

Researcher

Oanh Tran

Researcher

Phuong Nguyen

Researcher

Bao Ho

Researcher

Long Nguyen

Researcher

Thang Bui

Researcher

Tho Quan

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Tuan Bui, Oanh Tran, Phuong Nguyen, Bao Ho, Long Nguyen, Thang Bui, Tho Quan

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies