Replies: 1 comment
-
Are you comparing to the original torchscript before compilation or the torch-tensorrt compiled torchscript with full precision? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've used TRT to quantize HF transformer models and the quantized models end up much larger than the originals. For instance, a GPT-Neo model containing 125M parameters after quantization ends up being 1.1 GB whereas the original full precision torch-script module consumes only ~650 MB of space. Can anyone explain why this is the case?
Beta Was this translation helpful? Give feedback.
All reactions