feat(llm): use Gemini native generateContent/streamGenerateContent endpoints on Vertex#2023
Open
htimur wants to merge 8 commits into
Open
feat(llm): use Gemini native generateContent/streamGenerateContent endpoints on Vertex#2023htimur wants to merge 8 commits into
htimur wants to merge 8 commits into
Conversation
Signed-off-by: Timur Khamrakulov <timur.khamrakulov@gmail.com>
Signed-off-by: Timur Khamrakulov <timur.khamrakulov@gmail.com>
Signed-off-by: Timur Khamrakulov <timur.khamrakulov@gmail.com>
…ructs Signed-off-by: Timur Khamrakulov <timur.khamrakulov@gmail.com>
Signed-off-by: Timur Khamrakulov <timur.khamrakulov@gmail.com>
…loop roles Signed-off-by: Timur Khamrakulov <timur.khamrakulov@gmail.com>
Signed-off-by: Timur Khamrakulov <timur.khamrakulov@gmail.com>
cbf76a7 to
55e4337
Compare
Signed-off-by: Timur Khamrakulov <timur.khamrakulov@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1929.
Summary
This change adds a native Gemini api usage to the Vertex provider. Gemini models are now routed directly to
:generateContentand:streamGenerateContent, with request and response translation handled between the OpenAI chat-completions format and Gemini's native API.No changes for Anthropic models on Vertex were implemented, and usage of OpenAI compatible endpoint remains as it was before.
Worth mentioning / notes
llm.upstreamFinishReason, which exposes the original Gemini finish reason before it is mapped to an OpenAI finish reason. Its implemented only for the Vertex Gemini models, and I believe this is important for observability/monitoring.data:urls are converted to inline data andgs://urls to fileData with a resolved MIME type.http(s)urls are rejected because Vertex cannot fetch remote images directly.What changed
gemini-*models on the completions route now use:generateContentand:streamGenerateContentfor streaming. Gemini embedding models (for example,gemini-embedding-001) continue to use the embeddings endpoint, and Anthropic-on-Vertex behavior is unchanged.contents(including system instructions, tool calls, and tool responses), tools tofunctionDeclarations,tool_choicetofunctionCallingConfig, and sampling, structured output, and reasoning settings togenerationConfig(includingthinkingConfig).cachedContent,labels, andsafetySettingsare passed through unchanged.What was tested
Manual tests and evals were executed against Vertex Gemini models on a locally built instance of the gateway alongside the test suite and google ADK based tests, and no behaviour changes were detected.
Use of AI assistance
The native Gemini wire types and parts of the test suite were developed with the help of an LLM. I have reviewed the generated code, verified my understanding of its behaviour and taken ownership of the implementation. I reviewed the tests for correctness and coverage, and validated the change to confirm that it behaves as expected.