Mech Interp

From Neurons to Neutrons: A Case Study in Interpretability

Transformers trained on nuclear physics data learn representations close to human-derived nuclear theory. Mechanistic interpretability of neural networks can be a path towards new scientific understanding.