You can change inference or embedding models by using the following procedures.
To change the inference model to a model from the API catalog,
specify the model in the APP_LLM_MODELNAME environment variable when you start the RAG Server.
The following example uses the Mistral AI Mixtral 8x7B Instruct model.
APP_LLM_MODELNAME='mistralai/mixtral-8x7b-instruct-v0.1' docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d --buildTo get a list of valid model names, use one of the following methods:
-
Browse the models at https://build.ngc.nvidia.com/explore/discover. View the sample Python code and get the model name from the
modelargument to theclient.chat.completions.createmethod. -
Install the langchain-nvidia-ai-endpoints Python package from PyPi. Use the
get_available_models()method on an instance of aChatNVIDIAobject to list the models. Refer to the package web page for sample code to list the models.
To change the embedding model to a model from the API catalog,
specify the model in the APP_EMBEDDINGS_MODELNAME environment variable when you start the RAG server.
The following example uses the NVIDIA Embed QA 4 model.
APP_EMBEDDINGS_MODELNAME='NV-Embed-QA' docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
APP_EMBEDDINGS_MODELNAME='NV-Embed-QA' docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d --buildAs an alternative you can also specify the model names at runtime using /generate API call. Please refer to the Generate Answer Endpoint and Document Search Endpoint payload schema in this notebook.
To get a list of valid model names, use one of the following methods:
-
Browse the models at https://build.ngc.nvidia.com/explore/retrieval. View the sample Python code and get the model name from the
modelargument to theclient.embeddings.createmethod. -
Install the langchain-nvidia-ai-endpoints Python package from PyPi. Use the
get_available_models()method to on an instance of anNVIDIAEmbeddingsobject to list the models. Refer to the package web page for sample code to list the models.
[!TIP] Always use same embedding model or model having same tokinizers for both ingestion and retrieval to yield good accuracy.
You can specify the model for NVIDIA NIM containers to use in the nims.yaml file.
-
Edit the
deploy/nims.yamlfile and specify an image that includes the model to deploy.services: nim-llm: container_name: nim-llm-ms image: nvcr.io/nim/meta/<image>:<tag> ... nemoretriever-embedding-ms: container_name: nemoretriever-embedding-ms image: nvcr.io/nim/<image>:<tag> nemoretriever-ranking-ms: container_name: nemoretriever-ranking-ms image: nvcr.io/nim/<image>:<tag>
To get a list of valid model names, use one of the following methods:
-
Run
ngc registry image list "nim/*". -
Browse the NGC catalog at https://catalog.ngc.nvidia.com/containers.
-
-
Update the corresponding model names using environment variables as required.
export APP_EMBEDDINGS_MODELNAME=<> export APP_RANKING_MODELNAME=<> export APP_EMBEDDINGS_MODELNAME=<>
-
Follow the steps specified here to relaunch the containers with the updated models. Make sure to specify the correct model names using appropriate environment variables as shown in the earlier step.