Towards Robust Evaluation of STEM Education: Leveraging MLLMs in Project-Based Learning
Project Overview
The document explores the incorporation of Generative AI, particularly Multimodal Large Language Models (MLLMs), into STEM education through Project-Based Learning (PBL). It introduces PBLBench, a benchmarking tool aimed at evaluating the performance of MLLMs in assessing PBL outcomes that encompass various modalities, including text, images, code, and video. To facilitate this evaluation, the PBL-STEM dataset was developed. While the initial findings indicate the promise of MLLMs in enhancing educational assessments, significant challenges remain, such as low ranking accuracy and considerable instability in the evaluation processes. These issues highlight the need for further advancements to ensure that MLLMs can be effectively and reliably integrated into educational frameworks, ultimately improving the learning experience and outcomes in STEM disciplines.
Key Applications
PBLBench and PBL-STEM dataset
Context: Educational settings focused on STEM disciplines, targeting students engaged in Project-Based Learning activities.
Implementation: The PBL-STEM dataset was created to include diverse project outcomes with various modalities. PBLBench was developed to evaluate MLLMs on these outcomes using structured evaluation criteria.
Outcomes: The benchmark aims to provide a reliable assessment framework, enhance teacher workload management, and improve feedback to students.
Challenges: Current models exhibit low ranking accuracy and hallucinations, making them unreliable for comprehensive PBL evaluations.
Implementation Barriers
Technical Limitations
Models often produce hallucinations and unstable outputs, leading to unreliable assessments.
Proposed Solutions: Developing self-verification mechanisms for models to enhance scoring stability.
Evaluation Challenges
Existing benchmarks do not provide a free-form output structure or rigorous validation processes.
Proposed Solutions: Implementing expert-driven evaluation methods like the Analytic Hierarchy Process (AHP) to derive structured criteria.
Project Team
Yanhao Jia
Researcher
Xinyi Wu
Researcher
Qinglin Zhang
Researcher
Yiran Qin
Researcher
Luwei Xiao
Researcher
Shuai Zhao
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Yanhao Jia, Xinyi Wu, Qinglin Zhang, Yiran Qin, Luwei Xiao, Shuai Zhao
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai