From Voices to Worlds: Developing an AI-Powered Framework for 3D Object Generation in Augmented Reality

Project Overview

The document explores the Matrix framework, an innovative AI-powered tool designed to facilitate real-time 3D object generation within Augmented Reality (AR) environments for educational purposes. By leveraging text-to-3D generative AI, multilingual speech-to-text translation, and large language models, Matrix significantly enhances user interaction in both education and design fields. It effectively addresses challenges related to latency, efficiency, and accessibility, thereby fostering dynamic and engaging learning experiences through interactive 3D visualizations. The framework's optimization of GPU usage and reduction of model output sizes further enhance its applicability across various educational contexts. Overall, the findings indicate that Matrix not only improves the delivery of educational content but also promotes a more immersive and accessible learning environment, making it a valuable tool for educators and students alike.

Key Applications

Matrix framework for 3D object generation

Context: Augmented Reality applications in education, targeting educators and students.

Implementation: Implemented on Microsoft HoloLens 2, leveraging speech commands and context-aware suggestions.

Outcomes: Facilitates real-time generation of 3D objects based on verbal commands, improving vocabulary retention and engagement.

Challenges: Latency in model generation, inconsistencies in object rendering, and complexity of interface navigation.

Implementation Barriers

Technical Barrier

High GPU usage and computational latency in real-time 3D model generation.

Proposed Solutions: Optimizing GPU usage through a pre-generated object repository and semantic search for object reuse.

Inclusivity Barrier

Limited multilingual support and the need for a dynamic interaction framework for diverse user groups.

Proposed Solutions: Incorporating multilingual speech-to-text translation and enhancing contextual understanding for broader accessibility.

Project Team

Majid Behravan

Researcher

Denis Gracanin

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Majid Behravan, Denis Gracanin

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects