-
Notifications
You must be signed in to change notification settings - Fork 318
Add load and run tests for checkpoints that we want to have BC #2792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2792
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Cancelled Job, 9 PendingAs of commit 483898d with merge base fee314b ( NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
80009d9
to
074593b
Compare
5581e0e
to
3d66000
Compare
3d66000
to
db92794
Compare
"Float8DynamicActivationFloat8WeightConfig" in model_name | ||
and not is_sm_at_least_90() | ||
): | ||
return unittest.skip("FP8 checkpoint is produced in SM90+") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we ensure we have CI with sm90+ running this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it's tested here:
ao/.github/workflows/1xH100_tests.yml
Line 50 in 083361b
pytest test/integration --verbose -s |
added this skip because it fails in this test:
ao/.github/workflows/1xL4_tests.yml
Line 50 in 083361b
pytest test/integration --verbose -s |
|
||
_DEPRECATED_SINGLE_LINEAR_MODEL_NAMES = [ | ||
# model card: https://huggingface.co/torchao-testing/single-linear-Float8DynamicActivationFloat8WeightConfig-v1-0.13-dev | ||
"torchao-testing/single-linear-Float8DynamicActivationFloat8WeightConfig-v1-0.13-dev", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: specify what here is torchao version and what is config version? torchao-testing/model:single_linear-config:Float8DynamicActivationFloat8WeightConfig-config_version:1-torchao:0.13-dev
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a bit long, would changing 0.13-dev
to 0.13.dev
be clearer that this is a torchao version?
also just found that :
is not supported in hf model naming, only -
, .
and _
are supported.
can also move around the 0.13.dev to make it clearer as well, please let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a bit long, would changing 0.13-dev to 0.13.dev be clearer that this is a torchao version?
I think we should go for clarity here, if it's long that's ok. Can we pick one of -
, .
, _
as a delimiter, and make the name be formatted with key1.val1,key2.val2,...
, with whatever delimiters between pairs and key:val that we can use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have to be in the name, how about adding these in model card descriptions? there are some restrictions to the naming, including max length of 96:
Only regular characters and '-', '_', '.' are accepted. '--' and '..' are forbidden. '-' and '.' cannot start or end the name. The name cannot end with ".git" or ".ipynb". Max length is 96.
tried Float8 and it's 111: model_single-linear-config_Float8DynamicActivationFloat8WeightConfig-config-version_v2-torchao-version_0.13.dev
db92794
to
40bf04c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to make the naming exact
updated model card to include exact descriptions due to naming length restrictions |
40bf04c
to
fb85e52
Compare
Summary: Added load and run tests to make sure previously saved checkpoints can continue to load and run. includes FP8, INT4 and INT4 + preshuffled checkpoints since these might reach larger audience Test Plan: python test/integration/test_load_and_run_checkpoint.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2792, branch: jerryzh168/stack/28
fb85e52
to
483898d
Compare
Stacked PRs:
Add load and run tests for checkpoints that we want to have BC
Summary:
Added load and run tests to make sure previously saved checkpoints can continue to load and run.
includes FP8, INT4 and INT4 + preshuffled checkpoints since these might reach larger audience
Test Plan:
python test/integration/test_load_and_run_checkpoint.py
Reviewers:
Subscribers:
Tasks:
Tags: