Chatting with Papers: A Hybrid Approach Using LLMs and Knowledge Graphs
Project Overview
The document explores the innovative use of generative AI in education, particularly through the GhostWriter workflow, which integrates Large Language Models (LLMs) with Knowledge Graphs to improve academic information retrieval. This system enables users to interact more effectively with extensive collections of research papers by allowing natural language queries, thereby enhancing comprehension of intricate concepts and aiding in the refinement of research questions. Specifically targeted at supporting researchers in the social sciences, the workflow underscores the significance of structured knowledge for achieving precise information retrieval. The findings indicate that this approach not only streamlines the research process but also fosters deeper engagement with academic material, ultimately leading to improved educational outcomes and more efficient navigation of complex academic resources. Overall, the integration of generative AI in educational settings, particularly through tools like GhostWriter, presents a promising avenue for enhancing the research experience and advancing knowledge acquisition.
Key Applications
GhostWriter workflow
Context: Navigating a collection of social science articles for research purposes, targeting researchers and students in the social sciences.
Implementation: The workflow uses a combination of LLMs and knowledge graphs to extract and enrich data from research papers, providing a user-friendly interface for querying and chatting with the document collection.
Outcomes: The workflow enables improved information retrieval, reduced cognitive overload, and supports iterative question refinement. It allows researchers to engage deeply with the material and discover related works.
Challenges: Challenges include managing contradictory information from different sources, ensuring logical consistency, and maintaining accurate context in responses.
Implementation Barriers
Technical barrier
The integration of LLMs with knowledge graphs can lead to issues with coherence and context, particularly when dealing with contradictory information.
Proposed Solutions: Revising ranking algorithms to cluster and rank document fragments at the document level to preserve contextual coherence.
Data quality barrier
The success of the workflow relies on well-curated metadata and digitized full text.
Proposed Solutions: Implementing standards such as the Croissant Standard for datasets and ensuring robust API integrations for data harvesting.
Project Team
Vyacheslav Tykhonov
Researcher
Han Yang
Researcher
Philipp Mayr
Researcher
Jetze Touber
Researcher
Andrea Scharnhorst
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Vyacheslav Tykhonov, Han Yang, Philipp Mayr, Jetze Touber, Andrea Scharnhorst
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai