Course - Theories of Deep Learning MT25

Created: October 09, 2025 | Updated: November 20, 2025 | About these notes | View in context | Study these flashcards

My notes for this course are a little different from my other [[University Notes]]^U, since (at least now) it is assessed by mini-project at the end of the term; this means I’m trying to optimise more for understanding[^1] rather than exam grades. For this reason, some of the things I take notes on here might not actually be covered in the course explicitly (e.g. [[Notes - Theories of Deep Learning MT25, Vapnik-Chervonenkis dimension]]^U).

Notes

[[Notes - Theories of Deep Learning MT25, Vapnik-Chervonenkis dimension]]^U

Lectures

Reading List

Each lecture above is annotated with the articles and papers that were mentioned. Once a week, we also receive amount of

Week 1
- [[Paper - Gradient-based learning applied to document recognition, LeCun]]^U
- [[Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)]]^U
- Any of the papers description an application of deep learning in [[Lecture - Theories of Deep Learning MT25, II, Why deep learning]]^U
Week 2
- [[Paper - Error bounds for approximations with deep ReLU networks, Yarotsky (2016)]]^U
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Class 1
- [[Paper - Attention Is All You Need (2017)]]^U
Class 2
Class 3
- The Mathematics of
- Better understanding of why SGD with momentum actually outperforms ADADELTA and understanding the explanation they give in the paper
- Why the encoder vs decoder distinction

See:

Problem Sheets

Questions / To-Do List

Implement proof that “each MNIST digit class is contained on a locally less than 15 dimensional space”
Not known whether you can achieve the optimal $\epsilon^{-d/n}$ width using just one activation function, although it is possible with 2

Related posts

[[Courses MT25]]^U

(incoming)
[[Part C]]^U

(incoming)
[[University Notes]]^U

(incoming)
[[Notes - Theories of Deep Learning MT25, Vapnik-Chervonenkis dimension]]^U

(incoming)
[[Paper - Exponential expressivity in deep neural networks through transient chaos (2016)]]^U

(incoming)
[[Paper - ADADELTA, An Adaptive Learning Rate Method]]^U

(incoming)
[[Paper - Attention Is All You Need (2017)]]^U

(incoming)
[[Paper - Optimal nonlinear approximation, DeVore (1989)]]^U

(incoming)
[[Paper - Representation Benefits of Deep Feedforward Networks, Telgarsky (2015)]]^U

(incoming)
[[Paper - Explaining and harnessing adversarial examples (2015)]]^U

(incoming)
[[Paper - When and when can deep networks avoid the curse of dimensionality, Poggio (2016)]]^U

(incoming)
[[Paper - Error bounds for approximations with deep ReLU networks, Yarotsky (2016)]]^U

(incoming)
[[Paper - Gradient-based learning applied to document recognition, LeCun]]^U

(incoming)
[[Article - Deep, deep trouble, Elad]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, II, Why deep learning]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, I, Three ingredients of deep learning]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, XI, Visualising the filters and response in a CNN]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, XV, A few things we missed and a summary]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, XIV, Generative adversarial networks and diffusion]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, XVI, Ingredients for a successful mini-project report]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, XIII, Autoencoders]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, IV, Data classes for which DNNs can overcome the curse of dimensionality and Attention modules]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, XII, Vulnerabilities in deep learning models]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, III, Exponential expressivity with depth]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, VIII, Optimisation algorithms for training DNNs]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, VI, Controlling the variance of the Jacobian's spectrum]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, V, Controlling the exponential growth of variance and correlation]]^U

(incoming)
[[Lecture - Theories of Deep Learning MT25, VII, Stochastic gradient descent and its extensions]]^U

(incoming)
[[Paper - Dynamics of Transient Structure in In-Context Linear Regression Transformers]]^N

(incoming)