Skip to main content Skip to navigation

CSEPrompts: A Benchmark of Introductory Computer Science Prompts

Project Overview

The document examines the role of generative AI, specifically Large Language Models (LLMs), in the field of education, with an emphasis on Computer Science (CS). It introduces CSEPrompts, a framework aimed at assessing the performance of LLMs on introductory CS assignments, which encompasses programming prompts and multiple-choice questions derived from various coding platforms and MOOCs. The findings reveal that LLMs demonstrate strong capabilities in code generation; however, there are significant concerns regarding the potential for students to misuse these tools for automating assignment completion. The document further highlights the challenges associated with evaluating the outputs produced by LLMs, underscoring the necessity for a thorough and structured assessment approach to ensure academic integrity and effective learning outcomes.

Key Applications

CSEPrompts framework for evaluating LLM performance on CS assignments

Context: Introductory Computer Science courses, targeting students and educators in programming education

Implementation: CSEPrompts was created by collecting programming exercise prompts and MCQs from various coding websites and academic MOOCs. The performance of several LLMs was evaluated using this framework.

Outcomes: Demonstrated high performance of LLMs in generating code and answering questions, with GPT-3.5 outperforming others.

Challenges: Potential student misuse for generating complete assignments; limitations in reasoning and integration capabilities of LLMs.

Implementation Barriers

Educational Integrity

Concerns about students using LLMs to complete assignments, leading to artificially high grades without true understanding.

Proposed Solutions: Implementing academic integrity policies and educational programs to teach responsible use of AI.

Technical Limitations

LLMs demonstrate reasoning and integration limitations, which can affect the quality of generated code.

Proposed Solutions: Ongoing refinement of LLMs and the development of better evaluation metrics for assessing their outputs.

Project Team

Nishat Raihan

Researcher

Dhiman Goswami

Researcher

Sadiya Sayara Chowdhury Puspo

Researcher

Christian Newman

Researcher

Tharindu Ranasinghe

Researcher

Marcos Zampieri

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Nishat Raihan, Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Christian Newman, Tharindu Ranasinghe, Marcos Zampieri

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies