-
Notifications
You must be signed in to change notification settings - Fork 48
Fix the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo #111
Conversation
This pull request was exported from Phabricator. Differential Revision: D72093421 |
…nvalue-corrected Shampoo (facebookresearch#111) Summary: This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s. Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations. Differential Revision: D72093421
e8cd97f
to
292672b
Compare
This pull request was exported from Phabricator. Differential Revision: D72093421 |
…nvalue-corrected Shampoo (facebookresearch#111) Summary: This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s. Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations. Differential Revision: D72093421
292672b
to
cad128b
Compare
This pull request was exported from Phabricator. Differential Revision: D72093421 |
…nvalue-corrected Shampoo (facebookresearch#111) Summary: Pull Request resolved: facebookresearch#111 This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s. Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations. Differential Revision: D72093421
cad128b
to
390e7a7
Compare
# Initialize factor_matrices_eigenvalues all ones. | ||
for t in factor_matrices_eigenvalues: | ||
block_info.get_tensor(t).fill_(1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be implemented with a new block_info.allocate_ones_tensor
method.
# Initialize factor_matrices_eigenvectors as identity matrices. | ||
for t in factor_matrices_eigenvectors: | ||
block_info.get_tensor(t).fill_diagonal_(1.0) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be refactored as a new block_info.allocate_eye_tensor
method that calls block_info.allocate_zeros_tensor
followed by this code.
The advantage is a unified interface for all initialization options (zeros
, ones
, eye
) and it will be easier to change the implementation of each initialization method later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I really prefer not enhancing BlockInfo
due to the desire of removing it, what you said here make sense because it is not increasing the overall complexity of BlockInfo
.
…nvalue-corrected Shampoo (facebookresearch#111) Summary: This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s. Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations. Differential Revision: D72093421
390e7a7
to
fd0e041
Compare
This pull request was exported from Phabricator. Differential Revision: D72093421 |
…nvalue-corrected Shampoo (facebookresearch#111) Summary: Pull Request resolved: facebookresearch#111 This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s. Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations. Differential Revision: D72093421
fd0e041
to
daa1eaa
Compare
Summary: 1. Adding a new method `_create_preconditioned_dims_selector()` to create a list of boolean values indicating whether each dimension should be preconditioned or not. 2. Modifying the `_create_preconditioned_dims_selector_list()` method to use the new `_create_preconditioned_dims_selector()` method to create the list of boolean values. Differential Revision: D72407362
Summary: This diff addresses a potential type mismatch issue between `factor_matrices` and `factor_matrix_eigenvectors` in the QR algorithm. It also ensures consistent output type conversions for eigendecomposed Shampoo and eigenvalue-corrected Shampoo. - Convert `factor_matrix_eigenvectors` to match the type of `factor_matrices` before applying the QR algorithm. - Update output type conversions for eigendecomposed Shampoo and eigenvalue-corrected Shampoo to align with Shampoo. Prevents errors caused by type mismatches in the QR algorithm and ensures consistency in output type conversions across different Shampoo variants. Differential Revision: D72334952 Reviewed By: anana10c
daa1eaa
to
cc27512
Compare
…nvalue-corrected Shampoo (facebookresearch#111) Summary: This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s. Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations. Reviewed By: anana10c Differential Revision: D72093421
…nvalue-corrected Shampoo (facebookresearch#111) Summary: Pull Request resolved: facebookresearch#111 This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s. Extend `distributed_state_dict()`-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations. Reviewed By: anana10c Differential Revision: D72093421
This pull request was exported from Phabricator. Differential Revision: D72093421 |
cc27512
to
e736eb4
Compare
Summary: Followed by the discussion in #111 (comment), add this function to unify the support of allocating an identiy matrix by leverging the existing `BlockInfo.allocate_zero_tensor()`. Reviewed By: anana10c Differential Revision: D72478672 fbshipit-source-id: 8c5d52d63e5bdaf74c77140f0fc5c97aff39f7fd
This pull request has been merged in b683e18. |
Summary:
This diff fixes the statistics initializations of eigendecompose Shampoo and eigenvalue-corrected Shampoo. For eigendecompose Shampoo, this changes include initializing the eigenvectors matrices as identity matrices and initializing the eigenvalues as all ones. For eigenvalue-corrected Shampoo, this changes include initializing the eigenvectors matcies as identity matrices, and keeping corrected eigenvalues as all 0s.
Extend
distributed_state_dict()
-related tests to include testing eigendecompose Shampoo and eigenvalue-corrected Shampoo to verify the initializations.Differential Revision: D72093421