Processes Matter: How ML/GAI Approaches Could Support Open Qualitative Coding of Online Discourse Datasets
Project Overview
The document explores the integration of machine learning (ML) and generative AI (GAI) in educational settings, specifically focusing on the qualitative coding of online discourse in computer-supported collaborative learning (CSCL). It highlights the capabilities of AI in efficiently identifying content-related codes within discourse datasets, showcasing its potential to enhance the coding process. However, it also underscores the limitations of AI, particularly in interpreting the nuances of conversational dynamics, a strength inherent to human coders. The findings suggest that instead of viewing AI as a replacement for human analysis, it should be employed as a parallel co-coder to support and enrich qualitative research efforts. This collaborative approach aims to improve the overall quality of analysis in educational research, combining the efficiency of AI with the interpretative skills of human researchers to achieve more comprehensive insights into online learning interactions.
Key Applications
Open qualitative coding using ML/GAI approaches
Context: Analysis of online chat messages in a mobile learning software environment, targeting researchers in qualitative studies.
Implementation: Five ML/GAI approaches were compared against four human coders using a dataset of online chat messages from the Physics Lab.
Outcomes: AI approaches identified a majority of human codes and additional unique codes, particularly effective in identifying broader themes and finer-grained codes.
Challenges: AI struggled with codes grounded in conversational dynamics and often produced overly broad themes.
Implementation Barriers
Technological
Generative AI models may miss nuances and produce vague themes or non-grounded results.
Proposed Solutions: Improved prompt design and integration of human coding processes into AI methodologies.
Methodological
Evaluating open coding results from AI is challenging due to the lack of a 'ground truth' reference.
Proposed Solutions: Development of metrics like 'Coverage' to measure semantic similarity between machine-generated and human-generated codes.
Project Team
John Chen
Researcher
Alexandros Lotsos
Researcher
Grace Wang
Researcher
Lexie Zhao
Researcher
Bruce Sherin
Researcher
Uri Wilensky
Researcher
Michael Horn
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: John Chen, Alexandros Lotsos, Grace Wang, Lexie Zhao, Bruce Sherin, Uri Wilensky, Michael Horn
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai