Enhancing Knowledge Retrieval with In-Context Learning and Semantic Search through Generative AI

Project Overview

The document explores the integration of generative AI in education, focusing on overcoming knowledge retrieval challenges through innovative retrieval systems that merge Large Language Models (LLMs) with vector databases. It introduces methodologies such as Generative Text Retrieval (GTR) and Generative Tabular Text Retrieval (GTR-T), which enhance the accuracy and efficiency of information retrieval for academic inquiries without requiring extensive fine-tuning. The findings demonstrate the effectiveness of these approaches in processing both unstructured and structured data, significantly improving user interactions with AI technologies. Furthermore, the document emphasizes the critical need to democratize access to advanced AI capabilities in educational settings, ensuring that a broader range of learners and educators can benefit from these advancements. Overall, the research illustrates a promising direction for leveraging generative AI to facilitate enhanced learning experiences and support academic success.

Key Applications

Generative Text Retrieval

Context: Educational researchers, students, and users querying structured databases seeking relevant insights and specific information from large datasets, academic literature, and structured data.

Implementation: Utilizes generative capabilities of large language models (LLMs) combined with vector databases to create an advanced retrieval system. This system processes user queries, retrieves relevant information, and is optimized for querying structured data by transforming database tables into CSV format.

Outcomes: Achieved over 90% accuracy in information retrieval, improved efficiency in knowledge discovery, 82% Execution Accuracy (EX), and 60% Exact-Set-Match (EM) in generating SQL queries from natural language.

Challenges: Existing retrieval systems often rely on general-purpose LLMs, which struggle with domain-specific context. Other challenges include high costs associated with fine-tuning, schema matching, and translating natural language into SQL queries.

Implementation Barriers

Technical Barrier

High costs and resource-intensive nature of fine-tuning LLMs limit their wide adoption in specific academic domains. Existing retrieval systems often fail to provide accurate responses to domain-specific inquiries due to their reliance on general-purpose LLMs.

Proposed Solutions: Developing retrieval systems that do not require extensive fine-tuning, such as using pre-trained models augmented with domain-specific knowledge, and creating specialized retrieval systems that incorporate domain-specific contexts and knowledge.

Project Team

Mohammed-Khalil Ghali

Researcher

Abdelrahman Farrag

Researcher

Daehan Won

Researcher

Yu Jin

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Mohammed-Khalil Ghali, Abdelrahman Farrag, Daehan Won, Yu Jin

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects