Skip to main content Skip to navigation

Enhancing the De-identification of Personally Identifiable Information in Educational Data

Project Overview

The document examines the role of generative AI, particularly the GPT-4o-mini model, in enhancing educational data security by effectively detecting Personally Identifiable Information (PII). It compares the model's performance against established frameworks such as Microsoft Presidio and Azure AI Language, revealing that the fine-tuned GPT-4o-mini outperforms these alternatives in both precision and recall, while also being more cost-efficient. The findings underscore the necessity of safeguarding privacy within educational environments without compromising data utility, demonstrating the potential of advanced AI technologies to achieve a delicate balance between these two critical aspects. Overall, the document highlights how generative AI can serve as a powerful tool in education, not only for enhancing data protection but also for fostering innovative applications that improve learning outcomes and operational efficiency.

Key Applications

GPT-4o-mini for PII detection

Context: Educational data privacy protection

Implementation: Fine-tuned GPT-4o-mini model was trained on two public datasets to detect PII.

Outcomes: Achieved high recall (0.9589) and improved precision (threefold increase) compared to other models, while being cost-effective.

Challenges: Low precision in initial models led to false positives; reliance on Named Entity Recognition (NER) accuracy for effectiveness.

Implementation Barriers

Technical Barrier

The accuracy of PII detection relies heavily on the effectiveness of the Named Entity Recognition (NER) process.

Proposed Solutions: Utilizing advanced AI models for enhancing the performance of NER.

Financial Barrier

Existing models like Azure AI Language incur high costs for usage, making them less accessible.

Proposed Solutions: Implementing fine-tuned GPT models which offer lower operational costs.

Project Team

Y. Shen

Researcher

Z. Ji

Researcher

J. Lin

Researcher

K. R. Koedginer

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Y. Shen, Z. Ji, J. Lin, K. R. Koedginer

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies