An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science
Project Overview
The document explores the use of generative AI, particularly Large Language Models (LLMs), in education, focusing on their role in enhancing reproducibility in data science analyses. It introduces a novel analyst-inspector framework designed to evaluate and ensure that AI-generated data science workflows are reproducible. The challenges of reproducibility in data science are addressed, emphasizing the significance of structured prompting techniques that can improve LLM performance. Findings reveal a strong correlation between the reproducibility of AI-generated analyses and their accuracy, suggesting that employing various prompting strategies can lead to enhanced outcomes. Overall, the document underscores the potential of generative AI in educational contexts, particularly in fostering reliable and accurate data science practices through improved prompting methodologies.
Key Applications
Analyst-Inspector Framework for Reproducibility
Context: Data science education for students and professionals in analytics and statistics.
Implementation: The framework automates the evaluation of LLM-generated data science workflows, ensuring that they are sufficiently detailed and reproducible.
Outcomes: Improved reproducibility rates and accuracy of LLM-generated analyses; enhanced trust in AI-driven data analysis.
Challenges: The stochastic nature of LLM outputs can lead to inconsistencies in results; ensuring that LLM-generated workflows are adequately detailed for human analysts.
Implementation Barriers
Technical Barrier
Stochastic and opaque outputs from LLMs can lead to inconsistencies in analysis results.
Proposed Solutions: Implementing a structured prompting strategy that emphasizes reproducibility and thoroughness in the generation of workflows and code.
Methodological Barrier
Current methods for assessing computational reproducibility are predominantly manual and lack standardization.
Proposed Solutions: Developing automated frameworks that enforce reproducibility principles and provide clear guidelines for LLM-generated analyses.
Project Team
Qiuhai Zeng
Researcher
Claire Jin
Researcher
Xinyue Wang
Researcher
Yuhan Zheng
Researcher
Qunhua Li
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Qiuhai Zeng, Claire Jin, Xinyue Wang, Yuhan Zheng, Qunhua Li
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai