The ELEVATE-AI LLMs Framework: An Evaluation Framework for Use of Large Language Models in HEOR: an ISPOR Working Group Report
Project Overview
The document explores the transformative impact of Generative AI, specifically Large Language Models (LLMs), on education, highlighting key applications and outcomes. It introduces the ELEV ATE-AI LLMs framework, designed to assess the integration of LLMs in educational contexts, and underscores the necessity for standardized guidelines to ensure quality and transparency in AI-assisted learning. The framework encompasses ten evaluation domains and has been validated through various applications, including systematic literature reviews and educational modeling. The document acknowledges challenges such as data inaccuracies and the critical need for human oversight to complement AI tools. Overall, it emphasizes the importance of establishing rigorous evaluation standards to harness the full potential of Generative AI in enhancing educational practices and outcomes.
Key Applications
LLM-assisted Research and Analysis Framework
Context: Applicable to various tasks in health economics and biomedical research, including systematic literature reviews, health economic modeling, and health technology assessment. The use cases demonstrate a focus on improving efficiency and accuracy in evaluating health-related data and outcomes.
Implementation: Leveraging large language models (LLMs) like GPT-4 for automating processes such as title and abstract screening in systematic literature reviews and replicating health economic analyses. This includes detailed prompts and validation against expert-annotated datasets to ensure accuracy and reliability.
Outcomes: Enhanced efficiency in research processes, accurate replication of health economic models, and improved screening accuracy, promoting transparency and reproducibility in health sciences research.
Challenges: Generalizability across diverse tasks requires further testing; metrics for fairness and bias monitoring are still in development, with some reporting domains rated ambiguous or not reported.
Implementation Barriers
Technical
Challenges in establishing standards for evaluating LLMs in diverse HEOR tasks.
Proposed Solutions: Developing and validating specific metrics tailored to HEOR applications.
Methodological
Complexity in distinguishing between accuracy and comprehensiveness in reporting.
Proposed Solutions: Refining definitions and thresholds for metrics to enhance clarity in evaluation.
Ethical
Concerns regarding fairness and bias in AI outputs, especially in underrepresented populations.
Proposed Solutions: Prioritizing the development of fairness metrics and providing guidance on evaluating demographic representation.
Project Team
Rachael L. Fleurence
Researcher
Dalia Dawoud
Researcher
Jiang Bian
Researcher
Mitchell K. Higashi
Researcher
Xiaoyan Wang
Researcher
Hua Xu
Researcher
Jagpreet Chhatwal
Researcher
Turgay Ayer
Researcher
Contact Information
For more information about this project or to discuss potential collaboration opportunities, please contact:
Rachael L. Fleurence
Source Publication: View Original PaperLink opens in a new window