Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads
Project Overview
The document examines the role of generative AI, particularly large vision-and-language models (LVLMs), in education by assessing their ability to solve mathematical problems from the Mathematical Kangaroo Olympiad. It compares the reasoning skills of these AI models with those of children, revealing that although some LVLMs demonstrate enhanced performance on complex tasks, they generally fall short of children's capabilities, especially in tasks aimed at younger age groups. The analysis underscores that current AI models do not replicate human reasoning and possess a distinct interpretation of problem difficulty. This evaluation indicates both the potential and limitations of leveraging AI in educational contexts, suggesting that while generative AI can aid learning, it may not yet fully align with the cognitive processes of young learners.
Key Applications
SMART-840 dataset for benchmarking LVLMs
Context: Analyzing LVLMs' performance against children's reasoning skills in mathematical Olympiad problems.
Implementation: Created a dataset from the Mathematical Kangaroo Olympiad problems, evaluating LVLMs like GPT-4o and Gemini-Pro on their ability to solve these problems.
Outcomes: Identified that LVLMs perform poorly compared to children, particularly on problems aimed at younger students, revealing a significant performance gap.
Challenges: LVLMs struggle with problems involving basic reasoning and show variability in their responses.
Implementation Barriers
Technical Limitations
LVLMs exhibit a lack of understanding in reasoning tasks that require cumulative knowledge and problem-solving skills.
Proposed Solutions: Enhancing training datasets to include a more diverse range of mathematical problems and improving the models' capacity to understand basic reasoning.
Project Team
Anoop Cherian
Researcher
Kuan-Chuan Peng
Researcher
Suhas Lohit
Researcher
Joanna Matthiesen
Researcher
Kevin Smith
Researcher
Joshua B. Tenenbaum
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Joanna Matthiesen, Kevin Smith, Joshua B. Tenenbaum
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai