Connecting Parameters and Hessian Eigenspaces in Deep Learning
PUBLISHED ON MAR 21, 2025
— CATEGORIES:
publications
When training a neural network, something really interesting happens:
Most parameters can be removed early in training by applying a pruning mask
Furthermore, this mask doesn’t change much during training!
Training gradients live in a tiny subspace that is given by the loss curvature
Furthermore, this subspace doesn’t change much during training!
Looking at this, a curious mind may wonder: are those two seemingly unrelated observations actually connected?
Furthermore, parameter inspection is cheap, but accessing loss curvature is not. Could such a connection be leveraged to save resources?
Contributions
In this work, we propose to use Grassmannian metrics to measure this connection
For this, we need to perform large-scale Hessian eigendecompositions. This is generally intractable, but we were able to achieve unprecedented scales via sketched methods
We developed the skerch PyTorch library to perform sketched decompositions
We measured overlap in a variety of deep learning scenarios, observing that it is consistently orders of magnitude larger than random
You can find more details in the paper, and in the 5-minute video presentation linked above.
How can this be useful to you?
If you have a network with millions of parameters, and want to look at thousands of its Hessian/GGN eigenvectors and eigenvalues, take a look at skerch!
If you are interested in comparing spaces of seemingly unrelated quantities, our work may provide some insights
If you are interested in DL theory and loss landscape analysis, we still don’t know why this happens
If you are interested in DL optimization, pruning or uncertainty quantification, it may be possible to leverage this connection, approximating expensive Hessian quantities via cheap parameter inspection
Also, if this work is useful for your publication, consider citing us:
@article{
fernandez2025connecting,
title={Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods},
author={Andres Fernandez and Frank Schneider and Maren Mahsereci and Philipp Hennig},
journal={Transactions on Machine Learning Research},
year={2025},
url={https://openreview.net/forum?id=yGGoOVpBVP},
}