Evaluating Software Plagiarism Detection in the Age of AI: Automated Obfuscation and Lessons for Academic Integrity
Project Overview
The document explores the integration of generative AI in education, particularly in programming and computer science, highlighting its dual role in enhancing learning and challenging academic integrity. It outlines how the rise of AI tools, such as ChatGPT, has transformed student behaviors regarding plagiarism, complicating the detection of academic dishonesty due to advanced obfuscation techniques. The vulnerabilities of current plagiarism detection systems to these sophisticated methods are examined, revealing the pressing need for more robust and adaptive defense mechanisms. Findings indicate that a combination of different strategies is essential to combat the evolving threats posed by AI-generated content effectively. The document underscores the importance of fostering an ethical discourse in education to navigate the challenges presented by generative AI while maximizing its potential to improve learning outcomes and student productivity. Overall, it calls for a balanced approach that addresses the implications of AI in academic integrity while leveraging its capabilities to enhance educational practices.
Key Applications
AI-assisted plagiarism detection and code generation analysis
Context: Educational settings, particularly in programming courses (CS1 and beyond), where students utilize AI tools (e.g., ChatGPT, GPT-4) to assist with programming assignments and generate or obfuscate code.
Implementation: Utilization of AI tools for automated code generation, obfuscation, and analysis coupled with AI-assisted plagiarism detection systems. This includes using automated software plagiarism detection systems (e.g., JPlag, MOSS) to identify suspicious similarities in code submissions, and developing lightweight AI-assisted detectors that focus on anomaly detection in code submissions, assessing originality and identifying unusual patterns.
Outcomes: Mixed results regarding academic integrity; some studies indicate increased instances of plagiarism while others show no significant difference in learning outcomes. Improved detection rates have been observed for AI-generated programs, although challenges remain in recognizing partially AI-generated or collaboratively produced code. Detection systems show promise in identifying anomalies but struggle with mixed AI usage.
Challenges: Current detectors face difficulties against automated obfuscation attacks and inconsistencies with AI-generated code that may not fulfill assignment requirements. Ethical concerns about AI use in academics arise, alongside potential frustrations among students regarding the effectiveness of AI tools in their learning.
Implementation Barriers
Technical barrier
Existing plagiarism detection systems are vulnerable to advanced obfuscation techniques, particularly those using generative AI. Current detection tools struggle to reliably distinguish AI-generated code from human-written code, especially when advanced obfuscation techniques are employed.
Proposed Solutions: Implementing targeted defense mechanisms and combining attack-specific and attack-independent methods to bolster detection resilience. Proposals for improving detection accuracy through fine-tuning LLMs, machine learning classifiers, and enhanced detection techniques.
Ethical barrier
The definition of plagiarism is blurred with AI-generated content, raising questions about fairness in detection. Concerns about academic integrity and the ethical implications of using AI assistance in education, particularly regarding students' views on plagiarism.
Proposed Solutions: Developing clear guidelines on what constitutes plagiarism in the context of AI-generated work and ensuring human oversight in detection processes. Fostering open educational discussions about what constitutes plagiarism in the age of AI and developing clear institutional policies.
Project Team
Timur Sağlam
Researcher
Larissa Schmid
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Timur Sağlam, Larissa Schmid
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai