Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vertexai] set_backend_config() incorrectly ignores embedding model configuration when vector_db is None #4946

Open
kecbigmt opened this issue Feb 4, 2025 · 0 comments
Assignees
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.

Comments

@kecbigmt
Copy link

kecbigmt commented Feb 4, 2025

I found a potential issue with the embedding model configuration of RAG corpus creation.
I would appreciate if you could review and address this matter.

Environment details

  • OS type and version: macOS Sequoia 15.0.1
  • Python version: 3.13.1
  • pip version: 24.3.1
  • google-cloud-aiplatform version: 1.79.0

Steps to reproduce

  1. Create a RAG corpus with text-multilingual-embedding-002 model configuration but without vecto specification
  2. Observe that the embedding model defaults to text-embedding-005 instead of using the specified model

Code example

from vertexai import rag
from vertexai.generative_models import GenerativeModel, Tool
import vertexai

# Initialize Vertex AI
vertexai.init(project="YOUR_PROJECT_ID", location="us-central1")

# Configure embedding model to use text-multilingual-embedding-002
rag_embedding_model_config = rag.RagEmbeddingModelConfig(
    vertex_prediction_endpoint=rag.VertexPredictionEndpoint(
        publisher_model="projects/YOUR_PROJECT_ID/locations/us-central1/publishers/google/models/text-multilingual-embedding-002"
    )
)

# Create RagCorpus
rag_corpus = rag.create_corpus(
    display_name="my_corpus",
    backend_config=rag.RagVectorDbConfig(
        rag_embedding_model_config=rag_embedding_model_config,
    )
)

# The created corpus will use text-embedding-005 instead of text-multilingual-embedding-002

Root Cause

The issue is in _gapic_utils.py. The set_embedding_model_config() call is incorrectly indented within the if backend_config.vector_db is not None: block:

def set_backend_config(
    backend_config: Optional[Union[RagVectorDbConfig, None,]],
    rag_corpus: GapicRagCorpus,
) -> None:
    if backend_config is None:
        return

    if backend_config.vector_db is not None:
        vector_config = backend_config.vector_db
        # ... vector db configuration ...
        if backend_config.rag_embedding_model_config:  # <- Incorrect indentation
            set_embedding_model_config(
                backend_config.rag_embedding_model_config, rag_corpus
            )

Due to this indentation, the embedding model configuration is only applied when a vector database is specified. The set_embedding_model_config() call should be at the same level as the vector database check to ensure the embedding model is set regardless of vector database configuration.

Proposed fix:

def set_backend_config(
    backend_config: Optional[Union[RagVectorDbConfig, None,]],
    rag_corpus: GapicRagCorpus,
) -> None:
    if backend_config is None:
        return

    if backend_config.vector_db is not None:
        vector_config = backend_config.vector_db
        # ... vector db configuration ...
    
    if backend_config.rag_embedding_model_config:  # <- Correct indentation
        set_embedding_model_config(
            backend_config.rag_embedding_model_config, rag_corpus
        )
@product-auto-label product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.
Projects
None yet
Development

No branches or pull requests

2 participants