Skip to main content Skip to navigation

A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models

Project Overview

The document explores the integration of generative AI, specifically Large Language Models (LLMs), in education through a psychometrics-based approach to evaluate their effectiveness in pedagogical applications. It underscores the necessity for robust assessment tools designed for LLMs to enhance educational tasks like personalized tutoring and providing real-time feedback. The findings indicate that existing benchmarks are insufficient, prompting the authors to propose a new framework grounded in psychometric principles for a more accurate evaluation of LLM capabilities. Empirical tests on the GPT-4 model highlighted significant deficiencies, particularly in areas demanding higher cognitive engagement, suggesting that while LLMs hold promise for educational enhancement, their current limitations must be addressed to fully realize their potential in supporting learning processes.

Key Applications

Benchmark development for evaluating LLMs

Context: Educational assessment for teacher assistants and consultants in pedagogy

Implementation: Developed using a psychometrics-based approach guided by Bloom's taxonomy, involving expert input and empirical testing with GPT-4.

Outcomes: Provided a structured assessment tool for evaluating LLM performance in pedagogy, revealing strengths and weaknesses of LLMs in educational contexts.

Challenges: Current benchmarks only include multiple-choice questions, which may not capture higher-order cognitive processes; LLMs showed limited familiarity with pedagogical theories and methodologies.

Implementation Barriers

Technical Barrier

Current generative AI models struggle with complex educational tasks requiring deeper cognitive engagement. Existing benchmarks may not accurately reflect the competencies needed for LLMs to function effectively as educational tools.

Proposed Solutions: Further validation of recommendations and assessments provided by the model; exploration of more complex item types beyond multiple-choice questions; adoption of a psychometrics-based methodology for benchmark development that aligns assessments with educational outcomes.

Project Team

Elena Kardanova

Researcher

Alina Ivanova

Researcher

Ksenia Tarasova

Researcher

Taras Pashchenko

Researcher

Aleksei Tikhoniuk

Researcher

Elen Yusupova

Researcher

Anatoly Kasprzhak

Researcher

Yaroslav Kuzminov

Researcher

Ekaterina Kruchinskaia

Researcher

Irina Brun

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Elena Kardanova, Alina Ivanova, Ksenia Tarasova, Taras Pashchenko, Aleksei Tikhoniuk, Elen Yusupova, Anatoly Kasprzhak, Yaroslav Kuzminov, Ekaterina Kruchinskaia, Irina Brun

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies