Skip to main content Skip to navigation

Leveraging Generative AI for Enhancing Automated Assessment in Programming Education Contests

Project Overview

The document explores the use of generative AI, particularly large language models (LLMs), in automating the creation of test cases for programming education contests, emphasizing their role in enhancing assessment quality. The research evaluates AI-generated test cases against traditional expert-crafted assessments, revealing that while LLMs can significantly improve the identification of errors, they fall short of fully replacing human expertise, especially in complex scenarios. The findings indicate that integrating AI capabilities with human oversight yields the best results, suggesting a hybrid approach as the most effective strategy for utilizing generative AI in educational contexts. This highlights the potential of AI to augment educational practices while acknowledging the importance of human judgment in ensuring quality assessments.

Key Applications

Automated test case generation for programming problems

Context: Educational context involving competitive programming contests, targeting educators and contest organizers

Implementation: Leveraging NLP-driven methods with LLMs to create dynamic test cases based on problem statements

Outcomes: Enhanced assessment quality, identified previously undetected errors in 67% of 5th grade programming problems, reduced educator workload

Challenges: Reliability of AI-generated tests for complex problems, potential for missing edge cases, need for human oversight

Implementation Barriers

Technical Barrier

Current LLMs may not consistently replace human-authored tests in high-stakes or complex scenarios.

Proposed Solutions: Using a hybrid model that combines AI-generated tests with human expert review and augmentation.

Generalizability Barrier

The effectiveness of LLM-generated tests varies by context and may not be optimal for all programming contest formats.

Proposed Solutions: Expanding the methodology to different platforms and competition styles for better validation.

Cost Barrier

Utilizing proprietary LLM APIs could raise operational costs despite low current costs.

Proposed Solutions: Maintaining efficiency in API usage and considering open-source alternatives.

Project Team

Stefan Dascalescu

Researcher

Adrian Marius Dumitran

Researcher

Mihai Alexandru Vasiluta

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Stefan Dascalescu, Adrian Marius Dumitran, Mihai Alexandru Vasiluta

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies