Enhancing the De-identification of Personally Identifiable Information in Educational Data
Project Overview
The document examines the role of generative AI, particularly the GPT-4o-mini model, in enhancing educational data security by effectively detecting Personally Identifiable Information (PII). It compares the model's performance against established frameworks such as Microsoft Presidio and Azure AI Language, revealing that the fine-tuned GPT-4o-mini outperforms these alternatives in both precision and recall, while also being more cost-efficient. The findings underscore the necessity of safeguarding privacy within educational environments without compromising data utility, demonstrating the potential of advanced AI technologies to achieve a delicate balance between these two critical aspects. Overall, the document highlights how generative AI can serve as a powerful tool in education, not only for enhancing data protection but also for fostering innovative applications that improve learning outcomes and operational efficiency.
Key Applications
GPT-4o-mini for PII detection
Context: Educational data privacy protection
Implementation: Fine-tuned GPT-4o-mini model was trained on two public datasets to detect PII.
Outcomes: Achieved high recall (0.9589) and improved precision (threefold increase) compared to other models, while being cost-effective.
Challenges: Low precision in initial models led to false positives; reliance on Named Entity Recognition (NER) accuracy for effectiveness.
Implementation Barriers
Technical Barrier
The accuracy of PII detection relies heavily on the effectiveness of the Named Entity Recognition (NER) process.
Proposed Solutions: Utilizing advanced AI models for enhancing the performance of NER.
Financial Barrier
Existing models like Azure AI Language incur high costs for usage, making them less accessible.
Proposed Solutions: Implementing fine-tuned GPT models which offer lower operational costs.
Project Team
Y. Shen
Researcher
Z. Ji
Researcher
J. Lin
Researcher
K. R. Koedginer
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Y. Shen, Z. Ji, J. Lin, K. R. Koedginer
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai