Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads

Project Overview

The document examines the role of generative AI, particularly large vision-and-language models (LVLMs), in education by assessing their ability to solve mathematical problems from the Mathematical Kangaroo Olympiad. It compares the reasoning skills of these AI models with those of children, revealing that although some LVLMs demonstrate enhanced performance on complex tasks, they generally fall short of children's capabilities, especially in tasks aimed at younger age groups. The analysis underscores that current AI models do not replicate human reasoning and possess a distinct interpretation of problem difficulty. This evaluation indicates both the potential and limitations of leveraging AI in educational contexts, suggesting that while generative AI can aid learning, it may not yet fully align with the cognitive processes of young learners.

Key Applications

SMART-840 dataset for benchmarking LVLMs

Context: Analyzing LVLMs' performance against children's reasoning skills in mathematical Olympiad problems.

Implementation: Created a dataset from the Mathematical Kangaroo Olympiad problems, evaluating LVLMs like GPT-4o and Gemini-Pro on their ability to solve these problems.

Outcomes: Identified that LVLMs perform poorly compared to children, particularly on problems aimed at younger students, revealing a significant performance gap.

Challenges: LVLMs struggle with problems involving basic reasoning and show variability in their responses.

Implementation Barriers

Technical Limitations

LVLMs exhibit a lack of understanding in reasoning tasks that require cumulative knowledge and problem-solving skills.

Proposed Solutions: Enhancing training datasets to include a more diverse range of mathematical problems and improving the models' capacity to understand basic reasoning.

Project Team

Anoop Cherian

Researcher

Kuan-Chuan Peng

Researcher

Suhas Lohit

Researcher

Joanna Matthiesen

Researcher

Kevin Smith

Researcher

Joshua B. Tenenbaum

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Joanna Matthiesen, Kevin Smith, Joshua B. Tenenbaum

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects