Understanding training instabilities in multiplicative networks and transformers

Understanding training instabilities in multiplicative networks and transformers

Author: Hongxin Zhen, Mathematics

Summary

This project aims to investigate and address the challenge of gradient explosion in overparameterized multiplicative neural networks, a critical issue that hinders stable training and model performance. By exploring different parameter initialization techniques, optimization methods such as Adam and preconditioning strategies, the research seeks to identify effective solutions to enhance stability and convergence. The findings will contribute to improving training efficiency, reducing computational costs, and providing practical guidelines for researchers facing similar challenges. Ultimately, this work will support the development of more reliable and efficient AI models for real-world applications.

Institute for Advanced Teaching and Learning (IATL)

Understanding training instabilities in multiplicative networks and transformers

Understanding training instabilities in multiplicative networks and transformers

Summary

Assets