Fake News Detection: Comparative Evaluation of BERT-like Models and Large Language Models with Generative AI-Annotated Data

Project Overview

The document examines the role of generative AI, particularly large language models (LLMs), in enhancing educational outcomes through applications like fake news detection. It presents a comparative analysis of LLMs and BERT-like models, revealing that while BERT models typically outperform in classification tasks, LLMs offer superior resilience to variations in text, which is crucial for accurately identifying misinformation. Furthermore, the findings underscore the necessity of integrating AI-driven annotation with human oversight to enhance data labeling accuracy in fake news detection. This combination of generative AI and human input is positioned as a pivotal strategy for fostering critical thinking skills in students and enhancing media literacy, ultimately preparing learners to navigate the complexities of information in the digital age. The document concludes that generative AI holds significant potential for transformative educational practices, enabling personalized learning experiences and improving student engagement while addressing pressing issues such as misinformation.

Key Applications

Fake news detection using generative AI and LLMs.

Context: Research study focused on developing and evaluating models for detecting fake news, targeting researchers and practitioners in AI and data science.

Implementation: A dataset of news articles was labeled using GPT-4 for fake news classification, which was then used to fine-tune BERT-like models and LLMs.

Outcomes: BERT-like models outperformed LLMs in classification accuracy, while LLMs showed better performance under adversarial text perturbations. Combining AI-labeling with human review led to more accurate results.

Challenges: LLMs can struggle with fact verification and their performance may vary depending on the specific model and dataset used. There are also challenges related to the scalability and reliability of AI-generated labels without human oversight.

Implementation Barriers

Technical Barrier

The performance of LLMs can be inconsistent, especially in tasks like fact verification.

Proposed Solutions: Enhancing the prompting techniques and integrating multimodal data could improve the performance of LLMs.

Scalability Barrier

Human annotation is costly and not scalable for large datasets.

Proposed Solutions: Using AI-assisted labeling methods along with human review can help in scaling the annotation process.

Project Team

Shaina Raza

Researcher

Drai Paulen-Patterson

Researcher

Chen Ding

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Shaina Raza, Drai Paulen-Patterson, Chen Ding

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects