Detecting LLM-Generated Short Answers and Effects on Learner Performance
Project Overview
The document explores the implications of generative AI, particularly large language models (LLMs), in education, focusing on the challenges of detecting AI-generated text and the potential for misuse in online learning environments. It emphasizes the risks associated with students using LLM-generated content to enhance their performance on assessments, which raises significant concerns about academic integrity and the authenticity of learning experiences. A study conducted within the document compares the efficacy of a fine-tuned GPT-4 model against established detection tools like GPTZero, revealing that while these tools aim to safeguard educational integrity, the sophistication of LLMs poses a substantial challenge. The findings underscore the need for effective methodologies to discern AI involvement in student submissions, highlighting the ongoing dilemma of balancing technological advancements with ethical educational practices. Overall, the document serves as a critical examination of the dual-edged nature of generative AI in education, where its applications can both enhance learning and undermine the integrity of academic assessments.
Key Applications
Fine-tuned GPT-4 model for detecting LLM-generated responses
Context: Online learning environments, specifically for college students in tutor roles
Implementation: Fine-tuning was conducted using a labeled dataset from tutor responses to distinguish between LLM-generated and human-authored content.
Outcomes: The fine-tuned model achieved an accuracy of 80%, outperforming existing detection tools and highlighting the importance of human-verified training data.
Challenges: Existing detection tools like GPTZero have high false positive rates, and there is a risk of misclassifying human-authored responses.
Implementation Barriers
Technical and Pedagogical barrier
Existing detection tools struggle to reliably distinguish between human-authored and LLM-generated text, leading to inaccuracies. Additionally, LLM misuse can lead to 'metacognitive laziness' where learners bypass genuine engagement with material.
Proposed Solutions: Developing fine-tuned models using domain-specific datasets to improve detection accuracy. Furthermore, incorporating pedagogical strategies and interventions to discourage reliance on LLM-generated responses.
Project Team
Shambhavi Bhushan
Researcher
Danielle R Thomas
Researcher
Conrad Borchers
Researcher
Isha Raghuvanshi
Researcher
Ralph Abboud
Researcher
Erin Gatz
Researcher
Shivang Gupta
Researcher
Kenneth Koedinger
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Shambhavi Bhushan, Danielle R Thomas, Conrad Borchers, Isha Raghuvanshi, Ralph Abboud, Erin Gatz, Shivang Gupta, Kenneth Koedinger
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai