Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level
Project Overview
The document explores the implications of generative AI, particularly large language models (LLMs) such as ChatGPT, in the realm of education, emphasizing both the challenges and advancements in detecting AI-generated text in academic writing. It addresses the increasing concerns over plagiarism facilitated by these technologies and critiques current AI-generated text classifiers for their frequent inaccuracies, including a high rate of false positives. To combat these issues, the document introduces an innovative approach that leverages natural language processing (NLP) techniques to improve the accuracy and transparency of plagiarism detection methods. This proposed strategy achieves an impressive accuracy rate of up to 94% and is designed to adapt to the rapidly evolving landscape of LLM technologies. Ultimately, the document underscores the importance of maintaining academic integrity while integrating generative AI into educational practices.
Key Applications
NLP techniques for plagiarism detection
Context: Academic settings for evaluating student writing
Implementation: Utilizes NLP to generate paraphrased questions, compares responses using cosine similarity and LDA for classification
Outcomes: Achieves 94% accuracy in detecting AI-generated text, enhances transparency in evaluation, reduces false positives
Challenges: Existing classifiers are often outdated, may not work across different LLMs, and have black-box prediction accuracy
Implementation Barriers
Technical Barrier
Existing AI text classifiers rapidly become outdated with new LLM versions and may require retraining.
Proposed Solutions: Develop adaptable models that can integrate advancements in LLM technology without needing retraining.
Accuracy Barrier
Current classifiers produce high false positives and lack explainability in their predictions.
Proposed Solutions: Implement quantifiable metrics at sentence and document levels to improve interpretability for human evaluators.
Operational Barrier
Difficulty in distinguishing between human-generated and machine-generated text due to similarities in writing styles.
Proposed Solutions: Use contrasting loss functions and cosine similarity to enhance differentiation capabilities.
Project Team
Mujahid Ali Quidwai
Researcher
Chunhui Li
Researcher
Parijat Dube
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Mujahid Ali Quidwai, Chunhui Li, Parijat Dube
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai