Skip to content

Fix the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo #111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

tsunghsienlee
Copy link
Contributor

Summary:
This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s.

Extend distributed_state_dict()-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations.

Differential Revision: D72093421

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 29, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72093421

@tsunghsienlee tsunghsienlee requested a review from runame March 29, 2025 00:20
tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Mar 29, 2025
…nvalue-corrected Shampoo (facebookresearch#111)

Summary:

This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s.

Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations.

Differential Revision: D72093421
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72093421

tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Mar 29, 2025
…nvalue-corrected Shampoo (facebookresearch#111)

Summary:

This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s.

Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations.

Differential Revision: D72093421
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72093421

tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Mar 29, 2025
…nvalue-corrected Shampoo (facebookresearch#111)

Summary:
Pull Request resolved: facebookresearch#111

This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s.

Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations.

Differential Revision: D72093421
Comment on lines +1154 to +1122
# Initialize factor_matrices_eigenvalues all ones.
for t in factor_matrices_eigenvalues:
block_info.get_tensor(t).fill_(1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be implemented with a new block_info.allocate_ones_tensormethod.

Comment on lines +1142 to +1111
# Initialize factor_matrices_eigenvectors as identity matrices.
for t in factor_matrices_eigenvectors:
block_info.get_tensor(t).fill_diagonal_(1.0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be refactored as a new block_info.allocate_eye_tensor method that calls block_info.allocate_zeros_tensor followed by this code.

The advantage is a unified interface for all initialization options (zeros, ones, eye) and it will be easier to change the implementation of each initialization method later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I really prefer not enhancing BlockInfo due to the desire of removing it, what you said here make sense because it is not increasing the overall complexity of BlockInfo.

@tsunghsienlee tsunghsienlee marked this pull request as draft April 1, 2025 04:59
tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Apr 3, 2025
…nvalue-corrected Shampoo (facebookresearch#111)

Summary:

This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s.

Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations.

Differential Revision: D72093421
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72093421

tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Apr 3, 2025
…nvalue-corrected Shampoo (facebookresearch#111)

Summary:
Pull Request resolved: facebookresearch#111

This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s.

Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations.

Differential Revision: D72093421
Tsung-Hsien Lee added 2 commits April 3, 2025 15:51
Summary:
1. Adding a new method `_create_preconditioned_dims_selector()` to create a list of boolean values indicating whether each dimension should be preconditioned or not.
2. Modifying the `_create_preconditioned_dims_selector_list()` method to use the new `_create_preconditioned_dims_selector()` method to create the list of boolean values.

Differential Revision: D72407362
Summary:
This diff addresses a potential type mismatch issue between `factor_matrices` and `factor_matrix_eigenvectors` in the QR algorithm. It also ensures consistent output type conversions for eigendecomposed Shampoo and eigenvalue-corrected Shampoo.
- Convert `factor_matrix_eigenvectors` to match the type of `factor_matrices` before applying the QR algorithm.
- Update output type conversions for eigendecomposed Shampoo and eigenvalue-corrected Shampoo to align with Shampoo.

Prevents errors caused by type mismatches in the QR algorithm and ensures consistency in output type conversions across different Shampoo variants.

Differential Revision: D72334952

Reviewed By: anana10c
tsunghsienlee added a commit to tsunghsienlee/optimizers that referenced this pull request Apr 4, 2025
…nvalue-corrected Shampoo (facebookresearch#111)

Summary:

This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s.

Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations.

Reviewed By: anana10c

Differential Revision: D72093421
…nvalue-corrected Shampoo (facebookresearch#111)

Summary:
Pull Request resolved: facebookresearch#111

This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s.

Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations.

Reviewed By: anana10c

Differential Revision: D72093421
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D72093421

facebook-github-bot pushed a commit that referenced this pull request Apr 7, 2025
Summary: Followed by the discussion in #111 (comment), add this function to unify the support of allocating an identiy matrix by leverging the existing `BlockInfo.allocate_zero_tensor()`.

Reviewed By: anana10c

Differential Revision: D72478672

fbshipit-source-id: 8c5d52d63e5bdaf74c77140f0fc5c97aff39f7fd
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in b683e18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants