Programme
All talks are held in-person in MS.01 in the Zeeman BuildingLink opens in a new window. Registration and coffee breaks are held outside MS.01.
To contact the organisers, you may email probai-scaling-26@googlegroups.com.
Tentative schedule
More details will be made available closer to the workshop.
| Day | Time | Activity |
|---|---|---|
| 22 Jun (Mon) | 13:00 - 13:45 |
Registration and coffee |
| 14:00 - 15:30 | Tutorial: Tensor Programs to Derive Infinite Width Limits (Leena C Vankadara) | |
| 15:30 - 16:00 | Coffee break |
|
| 16:00 - 17:30 | Tutorial: The Proportional Depth-Width Scaling Limit of Neural Networks (Mufan Li) Abstract: We study the scaling limit of neural networks without skip connects, where the depth d and width n approach infinity at a constant ratio d/n. In this limiting regime, we can review each layer of the neural network as a time discretization, and derive a limiting SDE for the feature covariance matrix. |
|
| 23 Jun (Tue) | 08:50 - 09:20 | Coffee |
| 09:20 - 11:00 | Tutorial: Dynamical Mean Field Theory, Random Matrices and Learning in High Dimensions (Blake Bordelon) |
|
| 11:00 - 12:00 | Tutorial: Infinite-size Limit for ResNets, Part I (Louis-Pierre Chaintron) | |
| 12:00 - 13:30 | Lunch provided at venue |
|
| 13:30 - 15:00 | Research Talk: Infinite-size Limit for ResNets, Part II (Louis-Pierre Chaintron) | |
| 15:00 - 15:30 | Coffee break | |
| 15:30 - 16:40 | Research talk: How to train an LLM (Sam Smith) Abstract: Drawing on the experience of designing and scaling Griffin (https://arxiv.org/abs/2402.19427) and RecurrentGemma, I will introduce some of the key practical concepts behind training large language models. Likely to include: a brief introduction to Transformers, including why MLPs, not Attention, usually dominate computation. A simple mental model of the computational bottlenecks on TPUs and GPUs. How to train models too large to fit in memory on a single device. Scaling laws and hyper-parameter tuning. A detailed discussion of LLM inference. If time permits, I will discuss how to design recurrent models competitive with transformers, their advantages and drawbacks. |
|
| 16:40 - 17:50 | Research talk | |
| 17:50 - 18:30 | Break | |
| 18:30 - 20:30 | On-campus dinner for attendees (registration required) |
|
| 24 Jun (Wed) | 08:30 - 09:00 | Coffee |
| 09:00 - 10:10 | Research talk | |
| 10:10 - 11:20 | Research talk | |
| 11:20 - 11:40 | Coffee break | |
| 11:40 - 12:50 | Research talk |
|
| 12:50 - 14:20 | Workshop closure & lunch provided at venue |