Recent work in deep learning has underscored the importance of measuring and understanding trends in model performance as a function of basic variables, such as the size of the training dataset, the number of model parameters, and the amount of compute. These trends often, though not always, are governed by power-law scaling. I will survey some of the existing empirical evidence for these so-called “scaling laws” and then discuss regimes where we have a theoretical understanding underlying these trends, based on joint work with collaborators. I will close by discussing connections to optical implementations of deep neural networks.
|