CSEPrompts: A Benchmark of Introductory Computer Science Prompts
Project Overview
The document examines the role of generative AI, specifically Large Language Models (LLMs), in the field of education, with an emphasis on Computer Science (CS). It introduces CSEPrompts, a framework aimed at assessing the performance of LLMs on introductory CS assignments, which encompasses programming prompts and multiple-choice questions derived from various coding platforms and MOOCs. The findings reveal that LLMs demonstrate strong capabilities in code generation; however, there are significant concerns regarding the potential for students to misuse these tools for automating assignment completion. The document further highlights the challenges associated with evaluating the outputs produced by LLMs, underscoring the necessity for a thorough and structured assessment approach to ensure academic integrity and effective learning outcomes.
Key Applications
CSEPrompts framework for evaluating LLM performance on CS assignments
Context: Introductory Computer Science courses, targeting students and educators in programming education
Implementation: CSEPrompts was created by collecting programming exercise prompts and MCQs from various coding websites and academic MOOCs. The performance of several LLMs was evaluated using this framework.
Outcomes: Demonstrated high performance of LLMs in generating code and answering questions, with GPT-3.5 outperforming others.
Challenges: Potential student misuse for generating complete assignments; limitations in reasoning and integration capabilities of LLMs.
Implementation Barriers
Educational Integrity
Concerns about students using LLMs to complete assignments, leading to artificially high grades without true understanding.
Proposed Solutions: Implementing academic integrity policies and educational programs to teach responsible use of AI.
Technical Limitations
LLMs demonstrate reasoning and integration limitations, which can affect the quality of generated code.
Proposed Solutions: Ongoing refinement of LLMs and the development of better evaluation metrics for assessing their outputs.
Project Team
Nishat Raihan
Researcher
Dhiman Goswami
Researcher
Sadiya Sayara Chowdhury Puspo
Researcher
Christian Newman
Researcher
Tharindu Ranasinghe
Researcher
Marcos Zampieri
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Nishat Raihan, Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Christian Newman, Tharindu Ranasinghe, Marcos Zampieri
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai