Skip to main content Skip to navigation

Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting Generative AI-based Visualizations

Project Overview

The document explores the role of generative AI in education, specifically through the lens of a conceptual framework named EvaLLM, which is designed to evaluate large language models (LLMs) such as GPT-3.5 and Llama2-70b in their ability to create visualizations from datasets and user inquiries. It addresses both the challenges and opportunities presented by LLMs in generating meaningful visual outputs, underscoring the necessity for a robust evaluation system to assess the effectiveness and accuracy of these visualizations. The findings suggest that while generative AI holds significant potential for enhancing educational tools and resources, careful consideration of its implementation and evaluation is crucial in order to maximize learning outcomes and ensure reliable results. The study ultimately emphasizes the importance of structured frameworks like EvaLLM in advancing the integration of AI technologies in educational contexts, fostering a deeper understanding and more effective use of data-driven insights in teaching and learning environments.

Key Applications

EvaLLM - a conceptual stack for evaluating LLM-generated visualizations

Context: Used for generating visualizations from datasets and user queries, targeting data visualization practitioners and researchers.

Implementation: Implemented a web-based platform utilizing the EvaLLM stack for automated and manual evaluation.

Outcomes: Enhanced understanding of LLM capabilities in visualization, provided a framework for systematic evaluation, and identified common errors in visualization generation.

Challenges: Limitations in evaluating complex visualizations, reliance on human evaluators for certain aspects, and the need for a comprehensive benchmarking approach.

Implementation Barriers

Technical Barrier

Challenges in generating accurate visualizations due to the ambiguity in user queries and the LLM's understanding of data structures.

Proposed Solutions: Implementation of clearer prompt engineering and development of structured datasets to improve model training.

Resource Barrier

Limited availability of datasets for extensive testing and evaluation of LLMs in generating diverse visualizations.

Proposed Solutions: Future expansions of datasets and collaborative research to create more comprehensive evaluation frameworks.

Project Team

Luca Podo

Researcher

Muhammad Ishmal

Researcher

Marco Angelini

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Luca Podo, Muhammad Ishmal, Marco Angelini

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies