Why are PTQ TRT models larger than the original full precision models? #1021

Scikud · 2022-05-03T22:15:13Z

Scikud
May 3, 2022

I've used TRT to quantize HF transformer models and the quantized models end up much larger than the originals. For instance, a GPT-Neo model containing 125M parameters after quantization ends up being 1.1 GB whereas the original full precision torch-script module consumes only ~650 MB of space. Can anyone explain why this is the case?

narendasan · 2022-05-10T18:26:22Z

narendasan
May 10, 2022
Collaborator

Are you comparing to the original torchscript before compilation or the torch-tensorrt compiled torchscript with full precision?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why are PTQ TRT models larger than the original full precision models? #1021

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why are PTQ TRT models larger than the original full precision models? #1021

Uh oh!

Uh oh!

Scikud May 3, 2022

Replies: 1 comment

Uh oh!

narendasan May 10, 2022 Collaborator

Scikud
May 3, 2022

narendasan
May 10, 2022
Collaborator