LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding
Project Overview
The document explores the application of generative AI, specifically large language models (LLMs) like GPT-3.5, in education through a technique called LLM-assisted content analysis (LACA). This method enhances the efficiency of qualitative research by aiding in deductive coding, significantly reducing the time researchers spend on analyzing text data. A case study evaluating the coding of Trump's tweets demonstrates the effectiveness of LLMs, which were benchmarked against human coders across four datasets. The findings indicate that LLMs can achieve agreement levels comparable to those of human coders in various contexts, suggesting their potential to streamline qualitative analysis in educational research. However, the document also notes challenges such as the necessity for precise prompt design and the possibility of LLMs misinterpreting coding instructions. Overall, the integration of generative AI in education presents promising advancements in research methodologies while also necessitating attention to its limitations and appropriate use.
Key Applications
LLM-assisted content analysis (LACA)
Context: Qualitative research for coding text documents, targeted at researchers and academics.
Implementation: Integration of LLMs into the deductive coding process to assist with coding text, develop codebooks, and evaluate coding decisions.
Outcomes: Increased efficiency in coding time, comparable accuracy to human coders, and support for qualitative researchers in managing large datasets.
Challenges: Potential for misunderstanding code instructions by the LLM, need for human oversight in coding decisions, and reliance on well-structured prompts.
Implementation Barriers
Technical Barrier
LLMs may fail to understand specific coding tasks or categories, leading to inaccurate coding.
Proposed Solutions: Conduct hypothesis tests to assess model understanding, manually review model-generated codes and reasoning.
Operational Barrier
Researchers may be reluctant to trust LLM coding due to the lack of transparency in model decision-making.
Proposed Solutions: Incorporate model-generated reasoning to enhance transparency and facilitate human review of coding decisions.
Project Team
Robert Chew
Researcher
John Bollenbacher
Researcher
Michael Wenger
Researcher
Jessica Speer
Researcher
Annice Kim
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Robert Chew, John Bollenbacher, Michael Wenger, Jessica Speer, Annice Kim
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai