LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding

Project Overview

The document explores the application of generative AI, specifically large language models (LLMs) like GPT-3.5, in education through a technique called LLM-assisted content analysis (LACA). This method enhances the efficiency of qualitative research by aiding in deductive coding, significantly reducing the time researchers spend on analyzing text data. A case study evaluating the coding of Trump's tweets demonstrates the effectiveness of LLMs, which were benchmarked against human coders across four datasets. The findings indicate that LLMs can achieve agreement levels comparable to those of human coders in various contexts, suggesting their potential to streamline qualitative analysis in educational research. However, the document also notes challenges such as the necessity for precise prompt design and the possibility of LLMs misinterpreting coding instructions. Overall, the integration of generative AI in education presents promising advancements in research methodologies while also necessitating attention to its limitations and appropriate use.

Key Applications

LLM-assisted content analysis (LACA)

Context: Qualitative research for coding text documents, targeted at researchers and academics.

Implementation: Integration of LLMs into the deductive coding process to assist with coding text, develop codebooks, and evaluate coding decisions.

Outcomes: Increased efficiency in coding time, comparable accuracy to human coders, and support for qualitative researchers in managing large datasets.

Challenges: Potential for misunderstanding code instructions by the LLM, need for human oversight in coding decisions, and reliance on well-structured prompts.

Implementation Barriers

Technical Barrier

LLMs may fail to understand specific coding tasks or categories, leading to inaccurate coding.

Proposed Solutions: Conduct hypothesis tests to assess model understanding, manually review model-generated codes and reasoning.

Operational Barrier

Researchers may be reluctant to trust LLM coding due to the lack of transparency in model decision-making.

Proposed Solutions: Incorporate model-generated reasoning to enhance transparency and facilitate human review of coding decisions.

Project Team

Robert Chew

Researcher

John Bollenbacher

Researcher

Michael Wenger

Researcher

Jessica Speer

Researcher

Annice Kim

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Robert Chew, John Bollenbacher, Michael Wenger, Jessica Speer, Annice Kim

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects