A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models
Project Overview
The document explores the integration of generative AI, specifically Large Language Models (LLMs), in education through a psychometrics-based approach to evaluate their effectiveness in pedagogical applications. It underscores the necessity for robust assessment tools designed for LLMs to enhance educational tasks like personalized tutoring and providing real-time feedback. The findings indicate that existing benchmarks are insufficient, prompting the authors to propose a new framework grounded in psychometric principles for a more accurate evaluation of LLM capabilities. Empirical tests on the GPT-4 model highlighted significant deficiencies, particularly in areas demanding higher cognitive engagement, suggesting that while LLMs hold promise for educational enhancement, their current limitations must be addressed to fully realize their potential in supporting learning processes.
Key Applications
Benchmark development for evaluating LLMs
Context: Educational assessment for teacher assistants and consultants in pedagogy
Implementation: Developed using a psychometrics-based approach guided by Bloom's taxonomy, involving expert input and empirical testing with GPT-4.
Outcomes: Provided a structured assessment tool for evaluating LLM performance in pedagogy, revealing strengths and weaknesses of LLMs in educational contexts.
Challenges: Current benchmarks only include multiple-choice questions, which may not capture higher-order cognitive processes; LLMs showed limited familiarity with pedagogical theories and methodologies.
Implementation Barriers
Technical Barrier
Current generative AI models struggle with complex educational tasks requiring deeper cognitive engagement. Existing benchmarks may not accurately reflect the competencies needed for LLMs to function effectively as educational tools.
Proposed Solutions: Further validation of recommendations and assessments provided by the model; exploration of more complex item types beyond multiple-choice questions; adoption of a psychometrics-based methodology for benchmark development that aligns assessments with educational outcomes.
Project Team
Elena Kardanova
Researcher
Alina Ivanova
Researcher
Ksenia Tarasova
Researcher
Taras Pashchenko
Researcher
Aleksei Tikhoniuk
Researcher
Elen Yusupova
Researcher
Anatoly Kasprzhak
Researcher
Yaroslav Kuzminov
Researcher
Ekaterina Kruchinskaia
Researcher
Irina Brun
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Elena Kardanova, Alina Ivanova, Ksenia Tarasova, Taras Pashchenko, Aleksei Tikhoniuk, Elen Yusupova, Anatoly Kasprzhak, Yaroslav Kuzminov, Ekaterina Kruchinskaia, Irina Brun
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai