Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level

Project Overview

The document explores the implications of generative AI, particularly large language models (LLMs) such as ChatGPT, in the realm of education, emphasizing both the challenges and advancements in detecting AI-generated text in academic writing. It addresses the increasing concerns over plagiarism facilitated by these technologies and critiques current AI-generated text classifiers for their frequent inaccuracies, including a high rate of false positives. To combat these issues, the document introduces an innovative approach that leverages natural language processing (NLP) techniques to improve the accuracy and transparency of plagiarism detection methods. This proposed strategy achieves an impressive accuracy rate of up to 94% and is designed to adapt to the rapidly evolving landscape of LLM technologies. Ultimately, the document underscores the importance of maintaining academic integrity while integrating generative AI into educational practices.

Key Applications

NLP techniques for plagiarism detection

Context: Academic settings for evaluating student writing

Implementation: Utilizes NLP to generate paraphrased questions, compares responses using cosine similarity and LDA for classification

Outcomes: Achieves 94% accuracy in detecting AI-generated text, enhances transparency in evaluation, reduces false positives

Challenges: Existing classifiers are often outdated, may not work across different LLMs, and have black-box prediction accuracy

Implementation Barriers

Technical Barrier

Existing AI text classifiers rapidly become outdated with new LLM versions and may require retraining.

Proposed Solutions: Develop adaptable models that can integrate advancements in LLM technology without needing retraining.

Accuracy Barrier

Current classifiers produce high false positives and lack explainability in their predictions.

Proposed Solutions: Implement quantifiable metrics at sentence and document levels to improve interpretability for human evaluators.

Operational Barrier

Difficulty in distinguishing between human-generated and machine-generated text due to similarities in writing styles.

Proposed Solutions: Use contrasting loss functions and cosine similarity to enhance differentiation capabilities.

Project Team

Mujahid Ali Quidwai

Researcher

Chunhui Li

Researcher

Parijat Dube

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Mujahid Ali Quidwai, Chunhui Li, Parijat Dube

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects