GPT detectors are biased against non-native English writers

Project Overview

The document examines the role of generative AI in education, focusing on the challenges and implications of using AI detectors to assess student writing. It highlights a significant bias in these detectors, which tend to misclassify non-native English speakers' writing as AI-generated, raising ethical concerns about their use in educational assessments. The findings indicate that while current detection methods are effective for native English writers, they inadequately support non-native writers, risking their marginalization in academic environments. This underscores the urgent need for the development of more equitable detection tools that consider linguistic diversity and ensure fair evaluation practices. Overall, the document advocates for advancements in AI technology to foster inclusivity and enhance educational outcomes for all students.

Key Applications

ChatGPT for essay generation and enhancement

Context: Improving the quality of writing samples for college applications and educational assessments, including TOEFL and other standardized essays, by enhancing linguistic diversity and generating original content.

Implementation: Utilized ChatGPT to generate essays based on prompts from the Common App and enhance writing samples to resemble native speaker quality. The process included prompting for initial essay generation followed by a self-editing phase to improve the text further.

Outcomes: Significant reduction in misclassification rates of writing samples identified as AI-generated; detection rates for generated essays dropped dramatically after self-editing, demonstrating the effectiveness of enhancing linguistic diversity and quality.

Challenges: Challenges include high false positive rates for non-native English writing samples and the potential for simple prompt changes to circumvent AI detection mechanisms.

Implementation Barriers

Bias in detection

GPT detectors misclassify non-native English writing as AI-generated due to lower linguistic variability.

Proposed Solutions: Enhancing non-native writing using AI tools to increase linguistic complexity and diversity.

Detection reliability

Current detection methods are easily manipulated through prompt design.

Proposed Solutions: Developing more robust detection techniques that go beyond perplexity measures.

Project Team

Weixin Liang

Researcher

Mert Yuksekgonul

Researcher

Yining Mao

Researcher

Eric Wu

Researcher

James Zou

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, James Zou

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects