Skip to main content Skip to navigation

More Robots are Coming: Large Multimodal Models (ChatGPT) can Solve Visually Diverse Images of Parsons Problems

Project Overview

Generative AI, especially through large multimodal models like GPT-4V and Bard, is transforming the landscape of education, particularly in computing. These models excel at solving complex visual programming tasks, such as Parsons problems, with GPT-4V achieving an impressive success rate of 96.7%, while Bard follows with 69.2%. This advancement highlights both the potential of AI as a powerful educational tool and the challenges it poses to traditional assessment methods. As educators grapple with the implications of AI-assisted learning, they face the need to rethink their approaches to academic integrity and student evaluation. The integration of generative AI in educational settings prompts a critical examination of how assessments can adapt to ensure that they accurately reflect student learning in an era where AI can both aid and complicate the educational process.

Key Applications

Large Multimodal Models (GPT-4V and Bard)

Context: Computing education, focusing on introductory programming courses.

Implementation: The models were tested on solving Parsons problems presented in various visual formats.

Outcomes: GPT-4V solved 96.7% of the problems, while Bard solved 69.2%. The study highlighted the capabilities of these models in analyzing and solving visual programming tasks.

Challenges: Bard struggled with visual extraction and had a refusal rate of 7.5%, while both models faced issues with hallucinations and generating incorrect code blocks.

Implementation Barriers

Technical and Ethical Barrier

The emergence of new multimodal models challenges traditional assessment methods, as they can solve visual programming problems. There are also concerns over academic integrity, as students may misuse AI tools to complete assignments.

Proposed Solutions: Adoption of adversarial methods in assessments, rethinking evaluation strategies in light of AI capabilities, and implementing more complex visual problems to mitigate the risk of cheating while exploring alternative assessment methods.

Project Team

Irene Hou

Researcher

Owen Man

Researcher

Sophie Mettille

Researcher

Sebastian Gutierrez

Researcher

Kenneth Angelikas

Researcher

Stephen MacNeil

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Irene Hou, Owen Man, Sophie Mettille, Sebastian Gutierrez, Kenneth Angelikas, Stephen MacNeil

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies