A Large-Scale Real-World Evaluation of LLM-Based Virtual Teaching Assistant
Project Overview
The document explores the implementation and assessment of a Large Language Model (LLM)-based Virtual Teaching Assistant (VTA) in a graduate-level AI programming course with 477 students, focusing on its impact on student learning and engagement. Key applications of the VTA include providing instant feedback and facilitating interactions, which have led to improved student perceptions regarding its helpfulness, trustworthiness, and comfort over time, especially among those initially reluctant to engage with human instructors. Despite these positive outcomes, the study identifies challenges such as the VTA's limitations in reliability, depth of support compared to human instructors, and occasional issues with response time and accuracy. Overall, the findings highlight the potential of generative AI to enhance educational experiences while also underscoring the need for ongoing improvements to address its limitations.
Key Applications
Large Language Model-based Virtual Teaching Assistant (VTA)
Context: Graduate-level introductory AI programming course
Implementation: The VTA was integrated into the course to provide instant feedback and answer student inquiries through a web interface. It utilized LLMs to generate contextually relevant responses based on course materials.
Outcomes: Improved student engagement, increased comfort in asking questions, and reported benefits in perceived helpfulness and trustworthiness of the VTA compared to human instructors.
Challenges: Limitations in reliability and depth of support compared to human instructors, slow response times, and occasional inaccuracies in answers.
Implementation Barriers
Technological
Slow response time perceived by students, which may relate to the lack of output streaming and the way responses were delivered.
Proposed Solutions: Incorporating streaming functionality to provide partial responses as they are generated to improve user experience.
Perceptual
Students found the VTA less reliable than human instructors, impacting their trust in the system.
Proposed Solutions: Improving the underlying LLM architecture and enhancing instruction-following capabilities to boost trustworthiness.
Content-related
Difficulty in retrieving course-related content and issues with hallucinated or incorrect answers.
Proposed Solutions: Adopting hybrid retrieval strategies and expanding the document candidate pool to improve content retrieval accuracy.
Project Team
Sunjun Kweon
Researcher
Sooyohn Nam
Researcher
Hyunseung Lim
Researcher
Hwajung Hong
Researcher
Edward Choi
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Sunjun Kweon, Sooyohn Nam, Hyunseung Lim, Hwajung Hong, Edward Choi
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai