EdNet: A Large-Scale Hierarchical Dataset in Education
Project Overview
The document discusses the significant role of generative AI in education, focusing on the EdNet dataset, which has been developed from a multi-platform AI tutoring system called Santa. This extensive dataset boasts over 131 million interactions, providing a robust foundation for research in Artificial Intelligence in Education (AIEd). EdNet's hierarchical structure and variety of student actions facilitate advanced applications such as knowledge tracing, learning path recommendations, and dropout predictions. Additionally, it supports the development of student simulators using Reinforcement Learning (RL), enabling researchers to explore AIEd tasks at different levels of granularity. Overall, the findings emphasize the potential of generative AI to enhance educational outcomes by personalizing learning experiences, predicting student behaviors, and improving engagement through intelligent tutoring systems.
Key Applications
Predictive Modeling for Student Engagement and Performance
Context: AIEd research and mobile learning environments targeting educators and researchers. This includes predicting student responses, dropout rates, and performance metrics based on interaction data and assessments.
Implementation: Utilization of Transformer-based models, including SAINT and DAS, for various predictive tasks using datasets such as EdNet-KT1 and EdNet-KT4. The models are trained to analyze student interaction data and predict outcomes such as response accuracy and dropout likelihood, emphasizing the importance of training on substantial interaction data.
Outcomes: Achieved state-of-the-art performance in response prediction and dropout prediction, outperforming existing models in educational contexts. Facilitates individualized recommendations and enhances understanding of student engagement.
Challenges: Requires substantial interaction data for training and faces issues with data sparsity. Additionally, defining accurate educational sessions and dependencies on external data for assessments are significant hurdles.
Implementation Barriers
Data Availability and Sparsity
Lack of large-scale datasets reflecting diverse student behaviors in AIEd and limited records of certain behaviors, such as course dropout.
Proposed Solutions: Introduction of EdNet as a comprehensive dataset for research and use of innovative modeling techniques like Assessment Modeling to address label scarcity.
Implementation Complexity
Challenges in preprocessing raw interaction data for effective utilization.
Proposed Solutions: Hierarchical structuring of EdNet to allow various levels of abstraction for different AIEd tasks.
Project Team
Youngduck Choi
Researcher
Youngnam Lee
Researcher
Dongmin Shin
Researcher
Junghyun Cho
Researcher
Seoyon Park
Researcher
Seewoo Lee
Researcher
Jineon Baek
Researcher
Chan Bae
Researcher
Byungsoo Kim
Researcher
Jaewe Heo
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Youngduck Choi, Youngnam Lee, Dongmin Shin, Junghyun Cho, Seoyon Park, Seewoo Lee, Jineon Baek, Chan Bae, Byungsoo Kim, Jaewe Heo
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai