Leveraging Generative AI for Enhancing Automated Assessment in Programming Education Contests
Project Overview
The document explores the use of generative AI, particularly large language models (LLMs), in automating the creation of test cases for programming education contests, emphasizing their role in enhancing assessment quality. The research evaluates AI-generated test cases against traditional expert-crafted assessments, revealing that while LLMs can significantly improve the identification of errors, they fall short of fully replacing human expertise, especially in complex scenarios. The findings indicate that integrating AI capabilities with human oversight yields the best results, suggesting a hybrid approach as the most effective strategy for utilizing generative AI in educational contexts. This highlights the potential of AI to augment educational practices while acknowledging the importance of human judgment in ensuring quality assessments.
Key Applications
Automated test case generation for programming problems
Context: Educational context involving competitive programming contests, targeting educators and contest organizers
Implementation: Leveraging NLP-driven methods with LLMs to create dynamic test cases based on problem statements
Outcomes: Enhanced assessment quality, identified previously undetected errors in 67% of 5th grade programming problems, reduced educator workload
Challenges: Reliability of AI-generated tests for complex problems, potential for missing edge cases, need for human oversight
Implementation Barriers
Technical Barrier
Current LLMs may not consistently replace human-authored tests in high-stakes or complex scenarios.
Proposed Solutions: Using a hybrid model that combines AI-generated tests with human expert review and augmentation.
Generalizability Barrier
The effectiveness of LLM-generated tests varies by context and may not be optimal for all programming contest formats.
Proposed Solutions: Expanding the methodology to different platforms and competition styles for better validation.
Cost Barrier
Utilizing proprietary LLM APIs could raise operational costs despite low current costs.
Proposed Solutions: Maintaining efficiency in API usage and considering open-source alternatives.
Project Team
Stefan Dascalescu
Researcher
Adrian Marius Dumitran
Researcher
Mihai Alexandru Vasiluta
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Stefan Dascalescu, Adrian Marius Dumitran, Mihai Alexandru Vasiluta
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai