# The Physics of Deep Learning

A lazy repository of what we understand about neural networks

I don’t want to write a long-winded introduction on how the study of the steam engine, which started as an engineering discipline, flourished into a theory of thermodynamics. Or how physics specializes in breaking down complex systems into their simplest components to derive theories that explain the emergent behavior of the system. I assume you are already convinced that physics is a good place to seek inspiration.

This will be a series of short blog posts summarizing a sample of ideas and concepts in our understanding of deep learning. I will not claim that the material presented here will be *THE* physics approach to deep learning, as that could mean many things to different people. My definition of a physics approach to deep learning is one that adopts physicist-style thinking. This is not to say that the approach will be rigorous or even *entirely* correct. All we aim to achieve here is an intuitive understanding of some key components of deep learning. This intuition should generalize across different settings to a certain extent and make predictions we can test empirically. This approach is very pragmatic by its nature, and I plan on expending additional effort to ensure *anyone* can follow along. For those in the trenches, this would translate to adding a new tool to guide intuition in designing, training, and deploying models, and to everyone else, I hope you gain a fresh perspective and a new appreciation for familiar (or foreign) concepts in deep learning.

Some topics I plan on talking about:

- Grokking, or generalization beyond overfitting.
- Lottery Ticket Hypothesis.
- Scaling Laws and infinite width limits.
- Topics in Mechanistic Interpretability.
- Miscellaneous topics in optimization via gradient descent in the Deep Learning setting. This includes topics like the role of normalization, adaptive optimization, implicit/explicit regularization, etc.