-
Notifications
You must be signed in to change notification settings - Fork 315
[Performance] Batched calibration #2054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
32de48f to
35a0507
Compare
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
34814c7 to
6559de0
Compare
dc957a8 to
74f8882
Compare
Signed-off-by: Kyle Sayers <[email protected]>
74f8882 to
33c6fe9
Compare
Signed-off-by: Kyle Sayers <[email protected]>
33c6fe9 to
4cd1f89
Compare
Signed-off-by: Kyle Sayers <[email protected]>
c0e37ab to
413d7b6
Compare
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
HDCharles
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good aside from the missing docstring
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
dsikka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few questions related to offloading activations
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Purpose
batch_sizecontrols the batch size of calibration dataoffload_sequential_activationscontrols whether calibration data is offloaded to the CPU between layersPrerequisites
Changes
Batched Calibration
batch_sizeargumentdata_collatordefault from the default data collator to a"truncation"collatordata_collator_with_truncationfunction truncates all samples to the shortest length sample in the batch.LengthAwareSamplerwhich samples from the dataset such that samples with similar batch lengths are batched togetherDisable Offloading
offload_sequential_activationsargument, defaults to True (no behavior change)Misc
TextGenerationDataset_mask_paddingfromIntermediatesCache, as I do not believe that this method is effective in masking padding tokens from hessian calculationsTesting
Evaluation Regression
Deleting significant portions of the dataset (delete longer sequences first) has a detrimental effect on recovery
Modifiers
Calibration Regression Testing
I ran calibration for the following models (but did not evaluate recovery)
The following model examples can calibrate without issue:
The following models had a bug where processor and model dtypes were mismatched, but are now fixed by this PR:
The following models have an accelerate device offloading bug:
The following model examples have an MoE replacement bug:
Future Work
While these options are a great place to start, the next step to improve runtime is to allow multi-GPU compression, likely via torch.distributed tensor parallelism