The Physics of Deep Learning
A lazy repository of what we understand about neural networks
I don’t want to write a long-winded introduction on how the study of the steam engine, which started as an engineering discipline, flourished into a theory of thermodynamics. Or how physics specializes in breaking down complex systems into their simplest components to derive theories that explain the emergent behavior of the system. I assume you are already convinced that physics is a good place to seek inspiration.
This will be a series of short blog posts summarizing a sample of ideas and concepts in our understanding of deep learning. I will not claim that the material presented here will be THE physics approach to deep learning, as that could mean many things to different people. My definition of a physics approach to deep learning is one that adopts physicist-style thinking. This is not to say that the approach will be rigorous or even entirely correct. All we aim to achieve here is an intuitive understanding of some key components of deep learning. This intuition should generalize across different settings to a certain extent and make predictions we can test empirically. This approach is very pragmatic by its nature, and I plan on expending additional effort to ensure anyone can follow along. For those in the trenches, this would translate to adding a new tool to guide intuition in designing, training, and deploying models, and to everyone else, I hope you gain a fresh perspective and a new appreciation for familiar (or foreign) concepts in deep learning.
Some topics I plan on talking about:
- Grokking, or generalization beyond overfitting.
- Lottery Ticket Hypothesis.
- Scaling Laws and infinite width limits.
- Topics in Mechanistic Interpretability.
- Miscellaneous topics in optimization via gradient descent in the Deep Learning setting. This includes topics like the role of normalization, adaptive optimization, implicit/explicit regularization, etc.