Skip to main content Skip to navigation

Automating Chapter-Level Classification for Electronic Theses and Dissertations

Project Overview

The document explores the application of generative AI, specifically machine learning and language models, in the education sector by focusing on the automation of chapter-level classification for electronic theses and dissertations (ETDs). It highlights the inadequacy of traditional high-level metadata in accurately representing the complexity of ETDs, which hampers their discoverability. By leveraging an AI-driven approach to generate detailed chapter-level metadata, the research aims to enhance accessibility, improve information retrieval, and facilitate interdisciplinary research. The paper evaluates different classification strategies to identify the most effective methods for categorizing ETD chapters, ultimately demonstrating how generative AI can significantly improve the organization and accessibility of academic resources, thereby supporting better educational outcomes and research collaboration.

Key Applications

Automated chapter-level classification for ETDs

Context: Used in academic libraries for improving discoverability of electronic theses and dissertations

Implementation: Utilized machine learning and AI to segment ETDs into chapters and classify them with detailed metadata

Outcomes: Improved access to specific sections of ETDs, enhanced discoverability for interdisciplinary research, and better support for academic inquiries.

Challenges: Complexity of accurately segmenting chapters, variations in formatting across disciplines, and the need for subject matter expertise to validate AI-generated classifications.

Implementation Barriers

Technical Barrier

Challenges associated with segmenting ETDs into individual chapters due to varying formats and lack of support in PDFs.

Proposed Solutions: Combining AWS Textract with object detection techniques to improve text extraction and chapter segmentation.

Resource Barrier

The need for subject matter expertise to evaluate the accuracy of AI-generated classifications, which can be costly and time-consuming. This includes implementing robust prompting strategies and continuous feedback mechanisms to refine AI outputs.

Proposed Solutions: Developing strategies to lower the costs and time associated with obtaining subject matter expertise.

Project Team

Bipasha Banerjee

Researcher

William A. Ingram

Researcher

Edward A. Fox

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Bipasha Banerjee, William A. Ingram, Edward A. Fox

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies