Skip to main content Skip to navigation

Processes Matter: How ML/GAI Approaches Could Support Open Qualitative Coding of Online Discourse Datasets

Project Overview

The document explores the integration of machine learning (ML) and generative AI (GAI) in educational settings, specifically focusing on the qualitative coding of online discourse in computer-supported collaborative learning (CSCL). It highlights the capabilities of AI in efficiently identifying content-related codes within discourse datasets, showcasing its potential to enhance the coding process. However, it also underscores the limitations of AI, particularly in interpreting the nuances of conversational dynamics, a strength inherent to human coders. The findings suggest that instead of viewing AI as a replacement for human analysis, it should be employed as a parallel co-coder to support and enrich qualitative research efforts. This collaborative approach aims to improve the overall quality of analysis in educational research, combining the efficiency of AI with the interpretative skills of human researchers to achieve more comprehensive insights into online learning interactions.

Key Applications

Open qualitative coding using ML/GAI approaches

Context: Analysis of online chat messages in a mobile learning software environment, targeting researchers in qualitative studies.

Implementation: Five ML/GAI approaches were compared against four human coders using a dataset of online chat messages from the Physics Lab.

Outcomes: AI approaches identified a majority of human codes and additional unique codes, particularly effective in identifying broader themes and finer-grained codes.

Challenges: AI struggled with codes grounded in conversational dynamics and often produced overly broad themes.

Implementation Barriers

Technological

Generative AI models may miss nuances and produce vague themes or non-grounded results.

Proposed Solutions: Improved prompt design and integration of human coding processes into AI methodologies.

Methodological

Evaluating open coding results from AI is challenging due to the lack of a 'ground truth' reference.

Proposed Solutions: Development of metrics like 'Coverage' to measure semantic similarity between machine-generated and human-generated codes.

Project Team

John Chen

Researcher

Alexandros Lotsos

Researcher

Grace Wang

Researcher

Lexie Zhao

Researcher

Bruce Sherin

Researcher

Uri Wilensky

Researcher

Michael Horn

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: John Chen, Alexandros Lotsos, Grace Wang, Lexie Zhao, Bruce Sherin, Uri Wilensky, Michael Horn

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies