Scaling Neural Networks: Laws and Limits
Scaling up neural network models has enabled unprecedented capabilities through learning, but we still lack the first-principles understanding needed to ensure their safety, reliability, and efficiency. I will show how tools from statistical mechanics and random matrix theory allow us to analyze neural networks in appropriate infinite-size scaling limits, thereby mapping the learning regimes that govern observed scaling behavior. These results account for the main features of empirical neural scaling laws; enable transfer of near-optimal hyperparameters across model sizes, yielding significant computational benefits; and provide a framework for understanding emergent behaviors such as in-context learning.

