Multimodal large language models and physics visual tasks: comparative analysis of performance and costs

Project Overview

The document explores the utilization of multimodal large language models (MLLMs) in the realm of physics education, emphasizing their effectiveness in tasks that necessitate visual interpretation. It assesses multiple models from leading providers, analyzing their accuracy, financial implications, and potential applications in tutoring, assessment, and feedback generation. The findings indicate significant disparities in model performance, which carry important considerations for educational institutions in selecting models that are both capable and cost-effective. While the study underscores the potential of MLLMs to enrich physics education by offering innovative learning tools, it also points out their limitations, particularly regarding the complexity of visual tasks. Overall, the document presents a balanced view of the opportunities and challenges posed by generative AI in educational settings, suggesting that while MLLMs can significantly enhance learning experiences, careful consideration is required in their adoption and implementation.

Key Applications

Multimodal large language models (MLLMs) for physics education

Context: Physics education, targeting instructors and students in undergraduate physics courses

Implementation: Evaluating models on standardized, image-based physics assessments to determine their effectiveness in tutoring and grading

Outcomes: Models showed varying performance, with some achieving accuracy comparable to high-performing students on conceptual physics tasks

Challenges: Limitations in handling complex visual representations and providing nuanced feedback; high variability in performance and cost.

Implementation Barriers

Technical Barrier

MLLMs often struggle with interpreting visual inputs, which are critical in physics education.

Proposed Solutions: Carefully designed prompting strategies to enhance model reliability and tailored evaluations to identify model weaknesses.

Financial and Equity Barrier

The cost of using high-performing MLLMs can be prohibitive for under-resourced institutions, potentially reinforcing existing technological divides in education.

Proposed Solutions: Evaluating cost-performance ratios to identify affordable models that still deliver adequate educational value, and ensuring that effective, lower-cost models are accessible to a broader range of learners and institutions.

Project Team

Giulia Polverini

Researcher

Bor Gregorcic

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Giulia Polverini, Bor Gregorcic

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects