Skip to main content Skip to navigation

Understanding training instabilities in multiplicative networks and transformers

Cartoon AI bot learning

Understanding training instabilities in multiplicative networks and transformers

Author: Hongxin Zhen, Mathematics

Summary

This project aims to investigate and address the challenge of gradient explosion in overparameterized multiplicative neural networks, a critical issue that hinders stable training and model performance. By exploring different parameter initialization techniques, optimization methods such as Adam and preconditioning strategies, the research seeks to identify effective solutions to enhance stability and convergence. The findings will contribute to improving training efficiency, reducing computational costs, and providing practical guidelines for researchers facing similar challenges. Ultimately, this work will support the development of more reliable and efficient AI models for real-world applications.

Assets

Let us know you agree to cookies