diff --git a/pages/managed-inference/how-to/create-deployment.mdx b/pages/managed-inference/how-to/create-deployment.mdx index ad5ed35260..ecd53493ad 100644 --- a/pages/managed-inference/how-to/create-deployment.mdx +++ b/pages/managed-inference/how-to/create-deployment.mdx @@ -27,6 +27,10 @@ dates: Some models may require acceptance of an end-user license agreement. If prompted, review the terms and conditions and accept the license accordingly. - Choose the geographical **region** for the deployment. + - For custom models: Choose the model quantization. + + Each model comes with a default quantization. Select lower bits quantization to improve performance and enable the model to run on smaller GPU nodes, while potentially reducing precision. + - Specify the GPU Instance type to be used with your deployment. 4. Enter a **name** for the deployment, and optional tags. 5. Configure the **network connectivity** settings for the deployment: