Skip to main content Skip to navigation

EdNet: A Large-Scale Hierarchical Dataset in Education

Project Overview

The document discusses the significant role of generative AI in education, focusing on the EdNet dataset, which has been developed from a multi-platform AI tutoring system called Santa. This extensive dataset boasts over 131 million interactions, providing a robust foundation for research in Artificial Intelligence in Education (AIEd). EdNet's hierarchical structure and variety of student actions facilitate advanced applications such as knowledge tracing, learning path recommendations, and dropout predictions. Additionally, it supports the development of student simulators using Reinforcement Learning (RL), enabling researchers to explore AIEd tasks at different levels of granularity. Overall, the findings emphasize the potential of generative AI to enhance educational outcomes by personalizing learning experiences, predicting student behaviors, and improving engagement through intelligent tutoring systems.

Key Applications

Predictive Modeling for Student Engagement and Performance

Context: AIEd research and mobile learning environments targeting educators and researchers. This includes predicting student responses, dropout rates, and performance metrics based on interaction data and assessments.

Implementation: Utilization of Transformer-based models, including SAINT and DAS, for various predictive tasks using datasets such as EdNet-KT1 and EdNet-KT4. The models are trained to analyze student interaction data and predict outcomes such as response accuracy and dropout likelihood, emphasizing the importance of training on substantial interaction data.

Outcomes: Achieved state-of-the-art performance in response prediction and dropout prediction, outperforming existing models in educational contexts. Facilitates individualized recommendations and enhances understanding of student engagement.

Challenges: Requires substantial interaction data for training and faces issues with data sparsity. Additionally, defining accurate educational sessions and dependencies on external data for assessments are significant hurdles.

Implementation Barriers

Data Availability and Sparsity

Lack of large-scale datasets reflecting diverse student behaviors in AIEd and limited records of certain behaviors, such as course dropout.

Proposed Solutions: Introduction of EdNet as a comprehensive dataset for research and use of innovative modeling techniques like Assessment Modeling to address label scarcity.

Implementation Complexity

Challenges in preprocessing raw interaction data for effective utilization.

Proposed Solutions: Hierarchical structuring of EdNet to allow various levels of abstraction for different AIEd tasks.

Project Team

Youngduck Choi

Researcher

Youngnam Lee

Researcher

Dongmin Shin

Researcher

Junghyun Cho

Researcher

Seoyon Park

Researcher

Seewoo Lee

Researcher

Jineon Baek

Researcher

Chan Bae

Researcher

Byungsoo Kim

Researcher

Jaewe Heo

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Youngduck Choi, Youngnam Lee, Dongmin Shin, Junghyun Cho, Seoyon Park, Seewoo Lee, Jineon Baek, Chan Bae, Byungsoo Kim, Jaewe Heo

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies