LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected?

Project Overview

The document explores the integration of Generative AI, particularly Large Language Models (LLMs), in educational contexts, emphasizing the innovative concept of 'mixtext', which merges AI-generated content with human-written text. It introduces the MIXSET dataset developed to analyze this blend, while also addressing the challenges of detecting mixed texts in educational settings, which raise concerns about plagiarism and academic integrity. The discussion extends to various applications of generative AI, focusing on Machine Generative Text (MGT) detection and modification techniques. It highlights advancements in dataset creation for MGT detection and evaluates the performance of different AI models in producing human-like text, while acknowledging the ongoing challenges related to the quality and effectiveness of AI-generated texts in academic environments. Furthermore, the implications of AI on peer reviews are examined, underscoring the necessity for robust evaluation methods to distinguish between human-written and machine-generated content. Overall, the findings indicate that while generative AI has significant potential to enhance educational practices, careful consideration of its impact on academic integrity and the need for effective detection mechanisms is crucial for its successful implementation.

Key Applications

AI-assisted writing and text detection tools

Context: Utilized in various educational settings where students and researchers use AI to assist with writing, revising texts, and detecting machine-generated content, impacting academic integrity and quality of work.

Implementation: AI technologies, such as large language models (LLMs) like GPT-4 and Llama2, are employed to enhance the quality of human-written texts and to create datasets for detecting mixed text scenarios. This includes using both AI-assisted writing tools and methods for identifying machine-generated texts, allowing for analysis and improvement in detection methods.

Outcomes: Increases the clarity and quality of written work, improves academic performance, enhances the accuracy of detecting machine-generated texts, and aligns AI-generated content more closely with human writing styles.

Challenges: Concerns about academic dishonesty as students may present AI-revised work as their own, difficulties in accurately identifying mixed texts, variations in text quality, issues with dataset diversity, and the need for robust evaluation metrics.

Implementation Barriers

Technological

Current detection systems have difficulty distinguishing between human-written and AI-generated content, especially in mixed scenarios.

Proposed Solutions: Development of more advanced detection algorithms tailored for mixed texts, as suggested by the need for fine-grained detection methods.

Ethical

Concerns about the misuse of AI tools leading to plagiarism and undermining academic integrity in academic settings, particularly in writing, peer reviews, and assessments.

Proposed Solutions: Educational institutions need to establish clear guidelines on the ethical use of AI in writing and assessments, promoting transparency in AI-generated content.

Technical barrier

Challenges in ensuring the quality and consistency of datasets used for training AI models.

Proposed Solutions: Implementing rigorous evaluation methods and improving data preprocessing techniques.

Project Team

Qihui Zhang

Researcher

Chujie Gao

Researcher

Dongping Chen

Researcher

Yue Huang

Researcher

Yixin Huang

Researcher

Zhenyang Sun

Researcher

Shilin Zhang

Researcher

Weiye Li

Researcher

Zhengyan Fu

Researcher

Yao Wan

Researcher

Lichao Sun

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Qihui Zhang, Chujie Gao, Dongping Chen, Yue Huang, Yixin Huang, Zhenyang Sun, Shilin Zhang, Weiye Li, Zhengyan Fu, Yao Wan, Lichao Sun

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects