1

Towards Understanding Grokking: An Effective Theory of Representation Learning

This study investigates *grokking*, a generalization phenomenon first observed in transformer models trained on arithmetic data, using microscopic and macroscopic analyses, revealing four learning phases and a “Goldilocks zone” for optimal representation learning, while emphasizing the value of physics-inspired tools in understanding deep learning.