Skip to main content Skip to navigation

Detecting LLM-Generated Short Answers and Effects on Learner Performance

Project Overview

The document explores the implications of generative AI, particularly large language models (LLMs), in education, focusing on the challenges of detecting AI-generated text and the potential for misuse in online learning environments. It emphasizes the risks associated with students using LLM-generated content to enhance their performance on assessments, which raises significant concerns about academic integrity and the authenticity of learning experiences. A study conducted within the document compares the efficacy of a fine-tuned GPT-4 model against established detection tools like GPTZero, revealing that while these tools aim to safeguard educational integrity, the sophistication of LLMs poses a substantial challenge. The findings underscore the need for effective methodologies to discern AI involvement in student submissions, highlighting the ongoing dilemma of balancing technological advancements with ethical educational practices. Overall, the document serves as a critical examination of the dual-edged nature of generative AI in education, where its applications can both enhance learning and undermine the integrity of academic assessments.

Key Applications

Fine-tuned GPT-4 model for detecting LLM-generated responses

Context: Online learning environments, specifically for college students in tutor roles

Implementation: Fine-tuning was conducted using a labeled dataset from tutor responses to distinguish between LLM-generated and human-authored content.

Outcomes: The fine-tuned model achieved an accuracy of 80%, outperforming existing detection tools and highlighting the importance of human-verified training data.

Challenges: Existing detection tools like GPTZero have high false positive rates, and there is a risk of misclassifying human-authored responses.

Implementation Barriers

Technical and Pedagogical barrier

Existing detection tools struggle to reliably distinguish between human-authored and LLM-generated text, leading to inaccuracies. Additionally, LLM misuse can lead to 'metacognitive laziness' where learners bypass genuine engagement with material.

Proposed Solutions: Developing fine-tuned models using domain-specific datasets to improve detection accuracy. Furthermore, incorporating pedagogical strategies and interventions to discourage reliance on LLM-generated responses.

Project Team

Shambhavi Bhushan

Researcher

Danielle R Thomas

Researcher

Conrad Borchers

Researcher

Isha Raghuvanshi

Researcher

Ralph Abboud

Researcher

Erin Gatz

Researcher

Shivang Gupta

Researcher

Kenneth Koedinger

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Shambhavi Bhushan, Danielle R Thomas, Conrad Borchers, Isha Raghuvanshi, Ralph Abboud, Erin Gatz, Shivang Gupta, Kenneth Koedinger

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies