Course - Theories of Deep Learning MT25
- Course webpage
- Lecture notes
- 1, Three ingredients of deep learning
- 2, Why deep learning
- 3, Exponential expressivity with depth
- 4, Data classes for which DNNs can overcome the curse of dimensionality
- 5, Controlling the exponential growth of variance and correlation
- 6, Controlling the variance of the Jacobian’s spectrum
- 7, Stochastic gradient descent and its extensions
- 8, Optimization algorithms for training DNNs
- 9, Topology of the loss landscape
- 10, Observations of the loss landscape
- 11, Visualising the filters and response in a CNN
- 12, The scattering transform and into auto-encoders
- 13, Autoencoders
- 14, Generative adversarial networks
- 15, A few things we missed and a summary
- 16, Ingredients for a successful mini-project report
- Guest talk on PINNs
- Lecture recordings
- Other courses this term: [[Courses MT25]]U
My notes for this course are a little different from my other [[University Notes]]U, since (at least now) it is assessed by mini-project at the end of the term; this means I’m trying to optimise more for understanding1 rather than exam grades. For this reason, some of the things I take notes on here might not actually be covered in the course explicitly (e.g. [[Notes - Theories of Deep Learning MT25, Vapnik-Chervonenkis dimension]]U).
Notes
Lectures
- [[Lecture - Theories of Deep Learning MT25, I, Three ingredients of deep learning]]U
- [[Lecture - Theories of Deep Learning MT25, II, Why deep learning]]U
- [[Lecture - Theories of Deep Learning MT25, III, Exponential expressivity with depth]]U
- [[Lecture - Theories of Deep Learning MT25, IV, Data classes for which DNNs can overcome the curse of dimensionality and Attention modules]]U
- [[Lecture - Theories of Deep Learning MT25, V, Controlling the exponential growth of variance and correlation]]?
- [[Lecture - Theories of Deep Learning MT25, VI, Controlling the variance of the Jacobian’s spectrum]]?
- [[Lecture - Theories of Deep Learning MT25, VII, Stochastic gradient descent and its extensions]]?
- [[Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNs]]?
- [[Lecture - Theories of Deep Learning MT25, IX, Topology of the loss landscape]]?
- [[Lecture - Theories of Deep Learning MT25, X, Observations of the loss landscape]]?
- [[Lecture - Theories of Deep Learning MT25, XI, Visualising the filters and response in a CNN]]?
- [[Lecture - Theories of Deep Learning MT25, XII, The scattering transform and into auto-encoders]]?
- [[Lecture - Theories of Deep Learning MT25, XIII, Autoencoders]]?
- [[Lecture - Theories of Deep Learning MT25, XIV, Generative adversarial networks]]?
- [[Lecture - Theories of Deep Learning MT25, XV, A few things we missed and a summary]]?
- [[Lecture - Theories of Deep Learning MT25, XVI, Ingredients for a successful mini-project report]]?
Reading List
Each lecture above is annotated with the articles and papers that were mentioned. Once a week, we also receive amount of
- Week 1
- [[Paper - Gradient-based learning applied to document recognition, LeCun]]U
- [[Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)]]U
- Any of the papers description an application of deep learning in [[Lecture - Theories of Deep Learning MT25, II, Why deep learning]]U
- Week 2
- Week 3
- Activation function design for deep networks: linearity and effective initialisation, Murray
- Exponential expressivity in deep neural networks through transient chaos, Poole
- The emergence of spectral universality in deep networks, Pennington
- Rapid training of deep neural networks without skip connections or normalisation layers using Deep Kernel Shaping, Martens
- Week 4
- Week 5
- Week 6
- Week 7
- Week 8
Related Notes
See:
- [[Course - Machine Learning MT23]]U
- [[Course - Uncertainty in Deep Learning MT25]]U
- [[Course - Geometric Deep Learning HT26]]U
- [[Course - Continuous Mathematics HT23]]U
- [[Course - Optimisation for Data Science HT25]]U
Problem Sheets
- Sheet 1, solutions to A&C, [[Problem Sheet - Theories of Deep Learning, I]]?
- Sheet 2, solutions to A,B,C, [[Problem Sheet - Theories of Deep Learning, II]]?
- Sheet 3, solutions to A&C, [[Problem Sheet - Theories of Deep Learning, III]]?
- Sheet 4, solutions to A,B,C, [[Problem Sheet - Theories of Deep Learning IV]]?
Questions / To-Do List
- Implement proof that “each MNIST digit class is contained on a locally less than 15 dimensional space”
- Not known whether you can achieve the optimal $\epsilon^{-d/n}$ width using just one activation function, although it is possible with 2
-
Although inevitably I will probably find myself reward-mispecificiationing into optimising for mini-project results. ↩