Skip to content

Commit

Permalink
Merge branch 'r2.0.0' into pagaray/for_test_purpose_only_r2.0.0_mcore…
Browse files Browse the repository at this point in the history
…Update
  • Loading branch information
pablo-garay authored Sep 19, 2024
2 parents 0d82dca + 6f615d3 commit 8ee0734
Showing 1 changed file with 27 additions and 0 deletions.
27 changes: 27 additions & 0 deletions docs/source/nlp/quantization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,33 @@ This script will produce a quantized ``.nemo`` checkpoint at the experiment mana
It can also optionally produce an exported TensorRT-LLM engine directory or a ``.qnemo`` file that can be used for inference by setting the ``export`` parameters similar to the PTQ example.
Note that you may tweak the QAT trainer steps and learning rate if needed to achieve better model quality.

NeMo checkpoints trained in FP8 with `NVIDIA Transformer Engine <https://github.com/NVIDIA/TransformerEngine>`_
---------------------------------

If you have an FP8-quantized checkpoint, produced during pre-training or fine-tuning with Transformer Engine, you can convert it to a FP8 TensorRT-LLM engine directly using ``nemo.export``.
The API is the same as with regular ``.nemo`` and ``.qnemo`` checkpoints:

.. code-block:: python
from nemo.export.tensorrt_llm import TensorRTLLM
trt_llm_exporter = TensorRTLLM(model_dir="/path/to/trt_llm_engine_folder")
trt_llm_exporter.export(
nemo_checkpoint_path="/path/to/llama2-7b-base-fp8.nemo",
model_type="llama",
)
trt_llm_exporter.forward(["Hi, how are you?", "I am good, thanks, how about you?"])
The export settings for quantization can be adjusted via ``trt_llm_exporter.export`` arguments:

* ``fp8_quantized: Optional[bool] = None``: manually enables/disables FP8 quantization
* ``fp8_kvcache: Optional[bool] = None``: manually enables/disables FP8 quantization for KV-cache

By default quantization settings are auto-detected from the NeMo checkpoint.


References
----------
Expand Down

0 comments on commit 8ee0734

Please sign in to comment.