Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling
Project Overview
This document provides an overview of the current state of generative AI in education, specifically focusing on the prevalence of text-based models like ChatGPT. Through a comprehensive analysis of existing research, the study identifies key applications and thematic areas where generative AI is being utilized, such as personalized learning and automated assessment. While text-to-text models are dominant, the document also acknowledges the untapped potential of multimodal AI, including text-to-speech and text-to-image applications. The research underscores the importance of addressing crucial challenges, including academic integrity, ethical considerations, and the need for a more balanced research approach across various educational levels and AI modalities. The study aims to guide future research by highlighting areas for further exploration, ultimately contributing to a more effective and ethically sound integration of generative AI within the educational landscape.
Key Applications
Text-to-Speech
Context: Adolescents with learning difficulties
Implementation: Used to improve writing performance, spelling, and reading comprehension.
Outcomes: Potential to improve writing performance, spelling, and reading comprehension
Challenges: Limited high-quality research.
Text-to-Image
Context: Supporting creative ideation, eliciting student understanding, and creating facial images for medical education.
Implementation: Used to support creative ideation, elicit student understanding, and create visual aids.
Outcomes: Facilitating understanding of concepts, supporting creative ideation, and creating visual aids for education.
Challenges: Not explicitly mentioned.
AI Copilot (multimodal)
Context: Pathology
Implementation: Multimodal copilot for human pathology.
Outcomes: Multimodal copilot for human pathology.
Challenges: Not explicitly mentioned.
LLMs (e.g., ChatGPT)
Context: Assessment, administration, prediction, AI assistants, content delivery, intelligent tutoring systems, managing student learning, and drafting/editing.
Implementation: Used for various tasks, including assessment, content generation, providing AI assistance, and managing student learning.
Outcomes: Assistance in drafting and editing, and potential for improved learning experiences.
Challenges: Raises questions about academic integrity due to the ease of generating text, potentially compromising the originality of student work.
Multimodal AI
Context: Personalized learning, problem-solving, creativity, and building knowledge across various domains of learning and education.
Implementation: Utilizing text-to-speech, speech-to-text, text-to-image, and EdGPTs to enhance personalized learning and support diverse student needs.
Outcomes: Enhance personalized learning, support diverse student needs, and facilitate knowledge acquisition and reproduction.
Challenges: Limited attention compared to LLMs dealing with text-to-text transformations.
Implementation Barriers
Research Gap
Uneven distribution of research-based knowledge across different perspectives and technologies, with market leaders heavily impacting the landscape of AI in education. Lack of precise conventions for naming used technologies, causing potential ambiguity.
Proposed Solutions: More balanced attention across different AI modalities and educational levels. Adopt a harmonized approach to describing the underlying technologies to facilitate more comprehensive coverage and clarity in future reviews.
Technological Limitation
Most AI tools are simple, single-purpose, and not tailored to interdisciplinary learning needs.
Proposed Solutions: Researchers to examine and design more advanced, versatile, and useful AI tools for education.
Academic Integrity
The ease of generating text with LLMs may compromise the originality of student work.
Proposed Solutions: Broad understanding of different AI technologies, focus on student agency.
Pedagogical
Generative AI may reflect a surface approach to learning.
Proposed Solutions: Focus on student agency, AI models deliberately trained to serve educational needs (e.g., EdGPTs), multimodal AI-driven tools for visual and auditory content.
Project Team
Ville Heilala
Researcher
Roberto Araya
Researcher
Raija Hämäläinen
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Ville Heilala, Roberto Araya, Raija Hämäläinen
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gemini-2.0-flash-lite