diff --git a/docs/shortfin/llm/user/llama_serving.md b/docs/shortfin/llm/user/llama_serving.md index 5c4d0a7ff..1aeffc652 100644 --- a/docs/shortfin/llm/user/llama_serving.md +++ b/docs/shortfin/llm/user/llama_serving.md @@ -112,6 +112,14 @@ python -m sharktank.utils.hf_datasets llama3_8B_fp16 --local-dir $EXPORT_DIR > python3 convert_hf_to_gguf.py $WEIGHTS_DIR --outtype f16 --outfile $EXPORT_DIR/.gguf > ``` > Now this GGUF file can be used in the instructions ahead. +> +> If you would like to convert the model from a [`.gguf`](https://iree.dev/guides/parameters/#gguf) +> file to a [`.irpa`](https://iree.dev/guides/parameters/#irpa) file, you can +> use our [`sharktank.tools.dump_gguf`](https://github.com/nod-ai/shark-ai/blob/main/sharktank/sharktank/tools/dump_gguf.py) +> script: +> ```bash +> python -m sharktank.tools.dump_gguf --gguf-file $EXPORT_DIR/.gguf --save $EXPORT_DIR/.irpa +> ``` ### Define environment variables