Skip to main content Skip to navigation

LLaVA-Docent: Instruction Tuning with Multimodal Large Language Model to Support Art Appreciation Education

Project Overview

The document explores the innovative use of generative AI in education, specifically through the development of LLaVA-Docent, a multimodal large language model (MLLM) aimed at enriching art appreciation education. It addresses significant challenges within the field, including outdated teaching methods and the absence of personalized feedback for students. LLaVA-Docent is designed to offer tailored interactions, scaffolding, and real-time feedback to enhance student engagement and understanding of art. The findings underscore the effectiveness of integrating constructivist pedagogy with advanced AI technologies, demonstrating that such approaches can significantly improve accessibility and educational outcomes in art education, particularly for K-12 learners. The research suggests that by utilizing generative AI, educators can create more dynamic and responsive learning environments that cater to individual student needs, ultimately fostering a deeper appreciation and understanding of art.

Key Applications

LLaVA-Docent, a multimodal large language model for art appreciation

Context: K-12 art education, particularly for novice or inexperienced art viewers in classrooms and museums.

Implementation: The model was developed using design and development research methodology, incorporating iterative feedback and expert consultation to create a data design framework for art appreciation education.

Outcomes: Enhanced engagement and personalized learning experiences in art appreciation, enabling students to connect artworks to personal experiences and fostering critical thinking.

Challenges: Limited availability of effective AI tools for art appreciation, the need for age-appropriate content, and potential cognitive overload for students.

Implementation Barriers

Technological Limitations

Conventional large language models lack visual modality and specific educational adaptations, making them less effective for tasks such as art appreciation. There are challenges in integrating AI tools into existing curricula and ensuring effective interaction in real-world educational settings.

Proposed Solutions: Utilizing multimodal large language models (MLLMs) like LLaVA that integrate visual processing capabilities alongside text. Conducting further studies and pilot tests in K-12 environments to validate the effectiveness of the AI models in practice.

Privacy Concerns

Using closed models often requires sharing learner data with tech companies, raising potential privacy issues.

Proposed Solutions: Adopting open-source models that allow customization and on-device deployment to maintain privacy.

Project Team

Unggi Lee

Researcher

Minji Jeon

Researcher

Yunseo Lee

Researcher

Gyuri Byun

Researcher

Yoorim Son

Researcher

Jaeyoon Shin

Researcher

Hongkyu Ko

Researcher

Hyeoncheol Kim

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Unggi Lee, Minji Jeon, Yunseo Lee, Gyuri Byun, Yoorim Son, Jaeyoon Shin, Hongkyu Ko, Hyeoncheol Kim

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies