Understanding training instabilities in multiplicative networks and transformers
Understanding training instabilities in multiplicative networks and transformers
Author: Hongxin Zhen, Mathematics
Summary
This project aims to investigate and address the challenge of gradient explosion in overparameterized multiplicative neural networks, a critical issue that hinders stable training and model performance. By exploring different parameter initialization techniques, optimization methods such as Adam and preconditioning strategies, the research seeks to identify effective solutions to enhance stability and convergence. The findings will contribute to improving training efficiency, reducing computational costs, and providing practical guidelines for researchers facing similar challenges. Ultimately, this work will support the development of more reliable and efficient AI models for real-world applications.