Skip to main content Skip to navigation

An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science

Project Overview

The document explores the use of generative AI, particularly Large Language Models (LLMs), in education, focusing on their role in enhancing reproducibility in data science analyses. It introduces a novel analyst-inspector framework designed to evaluate and ensure that AI-generated data science workflows are reproducible. The challenges of reproducibility in data science are addressed, emphasizing the significance of structured prompting techniques that can improve LLM performance. Findings reveal a strong correlation between the reproducibility of AI-generated analyses and their accuracy, suggesting that employing various prompting strategies can lead to enhanced outcomes. Overall, the document underscores the potential of generative AI in educational contexts, particularly in fostering reliable and accurate data science practices through improved prompting methodologies.

Key Applications

Analyst-Inspector Framework for Reproducibility

Context: Data science education for students and professionals in analytics and statistics.

Implementation: The framework automates the evaluation of LLM-generated data science workflows, ensuring that they are sufficiently detailed and reproducible.

Outcomes: Improved reproducibility rates and accuracy of LLM-generated analyses; enhanced trust in AI-driven data analysis.

Challenges: The stochastic nature of LLM outputs can lead to inconsistencies in results; ensuring that LLM-generated workflows are adequately detailed for human analysts.

Implementation Barriers

Technical Barrier

Stochastic and opaque outputs from LLMs can lead to inconsistencies in analysis results.

Proposed Solutions: Implementing a structured prompting strategy that emphasizes reproducibility and thoroughness in the generation of workflows and code.

Methodological Barrier

Current methods for assessing computational reproducibility are predominantly manual and lack standardization.

Proposed Solutions: Developing automated frameworks that enforce reproducibility principles and provide clear guidelines for LLM-generated analyses.

Project Team

Qiuhai Zeng

Researcher

Claire Jin

Researcher

Xinyue Wang

Researcher

Yuhan Zheng

Researcher

Qunhua Li

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Qiuhai Zeng, Claire Jin, Xinyue Wang, Yuhan Zheng, Qunhua Li

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies