Skip to main content Skip to navigation

The ELEVATE-AI LLMs Framework: An Evaluation Framework for Use of Large Language Models in HEOR: an ISPOR Working Group Report

Project Overview

The document explores the transformative impact of Generative AI, specifically Large Language Models (LLMs), on education, highlighting key applications and outcomes. It introduces the ELEV ATE-AI LLMs framework, designed to assess the integration of LLMs in educational contexts, and underscores the necessity for standardized guidelines to ensure quality and transparency in AI-assisted learning. The framework encompasses ten evaluation domains and has been validated through various applications, including systematic literature reviews and educational modeling. The document acknowledges challenges such as data inaccuracies and the critical need for human oversight to complement AI tools. Overall, it emphasizes the importance of establishing rigorous evaluation standards to harness the full potential of Generative AI in enhancing educational practices and outcomes.

Key Applications

LLM-assisted Research and Analysis Framework

Context: Applicable to various tasks in health economics and biomedical research, including systematic literature reviews, health economic modeling, and health technology assessment. The use cases demonstrate a focus on improving efficiency and accuracy in evaluating health-related data and outcomes.

Implementation: Leveraging large language models (LLMs) like GPT-4 for automating processes such as title and abstract screening in systematic literature reviews and replicating health economic analyses. This includes detailed prompts and validation against expert-annotated datasets to ensure accuracy and reliability.

Outcomes: Enhanced efficiency in research processes, accurate replication of health economic models, and improved screening accuracy, promoting transparency and reproducibility in health sciences research.

Challenges: Generalizability across diverse tasks requires further testing; metrics for fairness and bias monitoring are still in development, with some reporting domains rated ambiguous or not reported.

Implementation Barriers

Technical

Challenges in establishing standards for evaluating LLMs in diverse HEOR tasks.

Proposed Solutions: Developing and validating specific metrics tailored to HEOR applications.

Methodological

Complexity in distinguishing between accuracy and comprehensiveness in reporting.

Proposed Solutions: Refining definitions and thresholds for metrics to enhance clarity in evaluation.

Ethical

Concerns regarding fairness and bias in AI outputs, especially in underrepresented populations.

Proposed Solutions: Prioritizing the development of fairness metrics and providing guidance on evaluating demographic representation.

Project Team

Rachael L. Fleurence

Researcher

Dalia Dawoud

Researcher

Jiang Bian

Researcher

Mitchell K. Higashi

Researcher

Xiaoyan Wang

Researcher

Hua Xu

Researcher

Jagpreet Chhatwal

Researcher

Turgay Ayer

Researcher

Contact Information

For more information about this project or to discuss potential collaboration opportunities, please contact:

Rachael L. Fleurence

Source Publication: View Original PaperLink opens in a new window