[vertexai] set_backend_config() incorrectly ignores embedding model configuration when vector_db is None #4946

kecbigmt · 2025-02-04T17:33:11Z

I found a potential issue with the embedding model configuration of RAG corpus creation.
I would appreciate if you could review and address this matter.

Environment details

OS type and version: macOS Sequoia 15.0.1
Python version: 3.13.1
pip version: 24.3.1
google-cloud-aiplatform version: 1.79.0

Steps to reproduce

Create a RAG corpus with text-multilingual-embedding-002 model configuration but without vecto specification
Observe that the embedding model defaults to text-embedding-005 instead of using the specified model

Code example

from vertexai import rag
from vertexai.generative_models import GenerativeModel, Tool
import vertexai

# Initialize Vertex AI
vertexai.init(project="YOUR_PROJECT_ID", location="us-central1")

# Configure embedding model to use text-multilingual-embedding-002
rag_embedding_model_config = rag.RagEmbeddingModelConfig(
    vertex_prediction_endpoint=rag.VertexPredictionEndpoint(
        publisher_model="projects/YOUR_PROJECT_ID/locations/us-central1/publishers/google/models/text-multilingual-embedding-002"
    )
)

# Create RagCorpus
rag_corpus = rag.create_corpus(
    display_name="my_corpus",
    backend_config=rag.RagVectorDbConfig(
        rag_embedding_model_config=rag_embedding_model_config,
    )
)

# The created corpus will use text-embedding-005 instead of text-multilingual-embedding-002

Root Cause

The issue is in _gapic_utils.py. The set_embedding_model_config() call is incorrectly indented within the if backend_config.vector_db is not None: block:

def set_backend_config(
    backend_config: Optional[Union[RagVectorDbConfig, None,]],
    rag_corpus: GapicRagCorpus,
) -> None:
    if backend_config is None:
        return

    if backend_config.vector_db is not None:
        vector_config = backend_config.vector_db
        # ... vector db configuration ...
        if backend_config.rag_embedding_model_config:  # <- Incorrect indentation
            set_embedding_model_config(
                backend_config.rag_embedding_model_config, rag_corpus
            )

Due to this indentation, the embedding model configuration is only applied when a vector database is specified. The set_embedding_model_config() call should be at the same level as the vector database check to ensure the embedding model is set regardless of vector database configuration.

Proposed fix:

def set_backend_config(
    backend_config: Optional[Union[RagVectorDbConfig, None,]],
    rag_corpus: GapicRagCorpus,
) -> None:
    if backend_config is None:
        return

    if backend_config.vector_db is not None:
        vector_config = backend_config.vector_db
        # ... vector db configuration ...
    
    if backend_config.rag_embedding_model_config:  # <- Correct indentation
        set_embedding_model_config(
            backend_config.rag_embedding_model_config, rag_corpus
        )

The text was updated successfully, but these errors were encountered:

product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Feb 4, 2025

matthew29tang assigned yinghsienwu Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vertexai] set_backend_config() incorrectly ignores embedding model configuration when vector_db is None #4946

[vertexai] set_backend_config() incorrectly ignores embedding model configuration when vector_db is None #4946

kecbigmt commented Feb 4, 2025

[vertexai] set_backend_config() incorrectly ignores embedding model configuration when vector_db is None #4946

[vertexai] set_backend_config() incorrectly ignores embedding model configuration when vector_db is None #4946

Comments

kecbigmt commented Feb 4, 2025

Environment details

Steps to reproduce

Code example

Root Cause