aferro

When training a neural network, something really interesting happens:

Most parameters can be removed early in training by applying a pruning mask
- Furthermore, this mask doesn’t change much during training!
Training gradients live in a tiny subspace that is given by the loss curvature
- Furthermore, this subspace doesn’t change much during training!

Looking at this, a curious mind may wonder: are those two seemingly unrelated observations actually connected? Furthermore, parameter inspection is cheap, but accessing loss curvature is not. Could such a connection be leveraged to save resources?

Contributions

In this work, we propose to use Grassmannian metrics to measure this connection
For this, we need to perform large-scale Hessian eigendecompositions. This is generally intractable, but we were able to achieve unprecedented scales via sketched methods
We developed the skerch PyTorch library to perform sketched decompositions
We measured overlap in a variety of deep learning scenarios, observing that it is consistently orders of magnitude larger than random

You can find more details in the paper, and in the 5-minute video presentation linked above.

How can this be useful to you?

If you have a network with millions of parameters, and want to look at thousands of its Hessian/GGN eigenvectors and eigenvalues, take a look at skerch!
If you are interested in comparing spaces of seemingly unrelated quantities, our work may provide some insights
If you are interested in DL theory and loss landscape analysis, we still don’t know why this happens
If you are interested in DL optimization, pruning or uncertainty quantification, it may be possible to leverage this connection, approximating expensive Hessian quantities via cheap parameter inspection

More resources

Paper: [arXiv], [OpenReview], [pdf]
Code: https://github.com/andres-fr/hessian_overlap
Skerch library: [GitHub], [pip]
Presentation:[pdf]

Also, if this work is useful for your publication, consider citing us:

@article{
fernandez2025connecting,
title={Connecting Parameter Magnitudes and Hessian Eigenspaces at Scale using Sketched Methods},
author={Andres Fernandez and Frank Schneider and Maren Mahsereci and Philipp Hennig},
journal={Transactions on Machine Learning Research},
year={2025},
url={https://openreview.net/forum?id=yGGoOVpBVP},
}

Time to touch some Grassmanians! 🌱

Connecting Parameters and Hessian Eigenspaces in Deep Learning

PUBLISHED ON MAR 21, 2025 — CATEGORIES: publications

How can this be useful to you?

More resources

TAGS: algebra, computer vision, machine learning, pruning, skerch, sketching