Skip to content

Conversation

@kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Nov 20, 2025

Purpose

  • Reduce calibration runtime by providing users with options to increase performance
    • batch_size controls the batch size of calibration data
    • offload_sequential_activations controls whether calibration data is offloaded to the CPU between layers

Prerequisites

Changes

Batched Calibration

  • Add batch_size argument
  • Change data_collator default from the default data collator to a "truncation" collator
  • The data_collator_with_truncation function truncates all samples to the shortest length sample in the batch.
    • Statistics about how many tokens are dropped using this method are in the tables below
    • The data collator can also be changed to "padding" instead to pad to the longest length sample in the batch
  • In order to reduce the amount of excess truncation/padding, default to LengthAwareSampler which samples from the dataset such that samples with similar batch lengths are batched together
Batch Size Time % Speedup % Deleted
Original (1) 11m17 N/A 0.0
1 11m17 0.0 0.0
2 10m48 4.2 0.2
4 10m39 5.6 0.5
8 10m39 5.6 1.1
16 10m58 2.8 2.6
64 11m4 11.2 12.0
128 9m29 16.0 23.9
512 7m39 37.3 75.3
  • The speedup is relatively meager up until you start deleting significant portions of the dataset via truncation

Disable Offloading

  • Add offload_sequential_activations argument, defaults to True (no behavior change)
    • Enabling this option increases throughput but also increases memory usage
Batch Size Time % Speedup % Deleted
Original (1) 11m17 N/A 0.0
1 10m14 9.3 0.0
2 9m46 13.4 0.2
4 9m36 14.9 0.5
8 9m48 13.1 1.1
16 9m26 16.3 2.6
32 9m27 16.2 5.8
128 8m34 24.0 23.9
512 6m40 40.9 75.3
  • Memory requirement for 512 samples on Llama 8B is ~70Gb, which is equivalent to batch size 128
  • With this option enabled and batch size 32, calibration runtime is less than 1s per layer (down from ~11s)
    • This implies that the theoretical maximum speedup from reducing calibration time alone is ~15% for this model + dataset

Misc

  • Fix examples
    • Fixed examples where there's issues between model dtypes and processor dtypes (Mixtral, Pixtral, Whisper)
    • For multimodal models which use multimodal datasets, remove their data collators, as the batch unwrapping is now done by theTextGenerationDataset
  • Remove _mask_padding from IntermediatesCache, as I do not believe that this method is effective in masking padding tokens from hessian calculations
  • Fix AWQ
    • AWQ was hard coded to handle only batches of size 1

Testing

Evaluation Regression

Batch Size Eval Score Difference % Deleted
Original (1) 0.6573 0.000 0.0
1 0.6513 -0.6 0.0
2 0.6513 -0.6 0.2
4 0.6657 +0.8 0.5
8 0.6513 -0.6 1.1
16 0.6672 +1.0 2.6
64 0.6338 -2.4 12.0
128 0.6603 +0.3 23.9
512 0.6391 -1.8 75.3

Deleting significant portions of the dataset (delete longer sequences first) has a detrimental effect on recovery

Modifiers

  • GPTQ
    • Ran full regression tests, as shown above
  • AWQ
    • Ran AWQ with batch size 32 and checked output sanity
  • Quantization Modifier
    • Ran NVFP4 with batch size 10 and checked output sanity

Calibration Regression Testing

I ran calibration for the following models (but did not evaluate recovery)

The following model examples can calibrate without issue:

  • Llama3
  • Gemma3
  • Internvl3
  • Mllama
  • Llama4

The following models had a bug where processor and model dtypes were mismatched, but are now fixed by this PR:

  • Mistral3
  • Pixtral
  • Whisper

The following models have an accelerate device offloading bug:

  • Idefics3
  • Phi3 Vision

The following model examples have an MoE replacement bug:

  • qwen3-vl-30b-a3b-Instruct

Future Work

While these options are a great place to start, the next step to improve runtime is to allow multi-GPU compression, likely via torch.distributed tensor parallelism

@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@kylesayrs kylesayrs force-pushed the kylesayrs/batched-calibration branch from 32de48f to 35a0507 Compare December 2, 2025 01:09
@kylesayrs kylesayrs changed the base branch from main to kylesayrs/modifiers-expose-targets December 2, 2025 01:10
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs force-pushed the kylesayrs/modifiers-expose-targets branch from 34814c7 to 6559de0 Compare December 2, 2025 19:11
@kylesayrs kylesayrs force-pushed the kylesayrs/batched-calibration branch from dc957a8 to 74f8882 Compare December 4, 2025 22:25
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs force-pushed the kylesayrs/batched-calibration branch from 74f8882 to 33c6fe9 Compare December 4, 2025 23:29
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs force-pushed the kylesayrs/batched-calibration branch from 33c6fe9 to 4cd1f89 Compare December 4, 2025 23:30
@kylesayrs kylesayrs force-pushed the kylesayrs/batched-calibration branch from c0e37ab to 413d7b6 Compare December 5, 2025 02:48
@kylesayrs kylesayrs changed the base branch from kylesayrs/modifiers-expose-targets to main December 5, 2025 02:54
@kylesayrs kylesayrs added the ready When a PR is ready for review label Dec 5, 2025
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs marked this pull request as ready for review December 5, 2025 05:55
HDCharles
HDCharles previously approved these changes Dec 5, 2025
Copy link
Collaborator

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good aside from the missing docstring

HDCharles
HDCharles previously approved these changes Dec 5, 2025
HDCharles
HDCharles previously approved these changes Dec 8, 2025
Signed-off-by: Kyle Sayers <[email protected]>
@dsikka dsikka requested a review from HDCharles December 9, 2025 01:24
HDCharles
HDCharles previously approved these changes Dec 9, 2025
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions related to offloading activations

dsikka
dsikka previously approved these changes Dec 10, 2025
Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs dismissed stale reviews from dsikka and HDCharles via 8cac5aa December 10, 2025 21:00
dsikka
dsikka previously approved these changes Dec 10, 2025
@dsikka dsikka enabled auto-merge (squash) December 10, 2025 21:01
Signed-off-by: Kyle Sayers <[email protected]>
@dsikka dsikka requested a review from HDCharles December 10, 2025 21:10
@dsikka dsikka merged commit 056ed3d into main Dec 11, 2025
11 checks passed
@dsikka dsikka deleted the kylesayrs/batched-calibration branch December 11, 2025 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants