Skip to main content Skip to navigation

Benchmarking the Pedagogical Knowledge of Large Language Models

Project Overview

The document explores the integration of generative AI, particularly large language models (LLMs), in education by proposing specialized benchmarks to assess their pedagogical knowledge alongside content knowledge. It highlights the inadequacies in current evaluations, advocating for tools that measure effective teaching strategies and comprehension of learner needs. To address these gaps, the paper introduces two distinct benchmarks: the Cross-Domain Pedagogical Knowledge (CDPK) benchmark and the Special Educational Needs and Disabilities (SEND) benchmark. These benchmarks are designed to evaluate the capabilities of LLMs across diverse educational contexts and subjects, ensuring a holistic understanding of teaching and learning processes. The findings underscore the potential of generative AI to enhance educational outcomes by tailoring responses to meet specific learner requirements, thus fostering an inclusive and effective learning environment. The document concludes that by rigorously assessing LLMs with these benchmarks, educators can leverage AI more effectively to support and improve pedagogical practices.

Key Applications

Pedagogical Knowledge Benchmarking

Context: Evaluating the capabilities of LLMs in various educational contexts, including general pedagogy and special educational needs and disabilities (SEND). This involves assessing responses to pedagogical scenarios and multiple-choice questions sourced from educational exams.

Implementation: Developed using multiple-choice questions and pedagogical scenarios that reflect both general and SEND-specific educational practices. The benchmarks evaluate LLMs against pedagogical knowledge and practices, utilizing a variety of methodologies to assess their performance across different educational contexts.

Outcomes: Results highlighted variations in LLM performance related to pedagogical knowledge, with accuracies reported across multiple models ranging from 28% to 89%. The evaluations emphasized the challenges of addressing special education needs and provided insights for guiding the deployment of LLMs in educational tools.

Challenges: Assessing pedagogical knowledge can be subjective due to the nature of teaching practices. Additionally, the limited availability of high-quality resources specific to SEND pedagogy poses challenges in creating comprehensive benchmarks.

Implementation Barriers

Resource Limitations

Difficulty in accessing high-quality educational resources and benchmarks specific to pedagogical knowledge.

Proposed Solutions: Encouragement of open-source contributions and collaborations among educational institutions to create comprehensive datasets.

Assessment Challenges

The subjective nature of assessing pedagogical practices and the need for effective evaluation methodologies.

Proposed Solutions: Developing standardized, replicable assessment criteria that can objectively measure pedagogical knowledge and practices.

Project Team

Maxime Lelièvre

Researcher

Amy Waldock

Researcher

Meng Liu

Researcher

Natalia Valdés Aspillaga

Researcher

Alasdair Mackintosh

Researcher

María José Ogando Portela

Researcher

Jared Lee

Researcher

Paul Atherton

Researcher

Robin A. A. Ince

Researcher

Oliver G. B. Garrod

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Maxime Lelièvre, Amy Waldock, Meng Liu, Natalia Valdés Aspillaga, Alasdair Mackintosh, María José Ogando Portela, Jared Lee, Paul Atherton, Robin A. A. Ince, Oliver G. B. Garrod

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies