Skip to content

feat: add Google Nano Banana image generation with Vertex AI auth#25

Merged
arcaputo3 merged 9 commits intomainfrom
claude/add-vertex-ai-auth-5hrqd
Mar 9, 2026
Merged

feat: add Google Nano Banana image generation with Vertex AI auth#25
arcaputo3 merged 9 commits intomainfrom
claude/add-vertex-ai-auth-5hrqd

Conversation

@arcaputo3
Copy link
Copy Markdown
Contributor

Summary

  • Add Google Nano Banana (Gemini image models) as a dedicated create_image_google tool with reference image support (up to 14 images via multimodal generate_content() input)
  • Support Vertex AI Express authentication (API key), Vertex AI ADC (project + service account), and Gemini Developer API
  • Restore create_image to a clean OpenAI-only signature — no more "OpenAI only" / "Google only" param clutter
  • Conditionally register create_image_google only when google-genai is installed and credentials are configured
  • Fix output_mime_type compatibility: only set on Vertex AI (not supported on Gemini Developer API)

Changes

  • src/sanzaru/tools/image.py — public create_image_google() with input_images, reverted create_image() to OpenAI-only
  • src/sanzaru/descriptions.py — dedicated CREATE_IMAGE_GOOGLE description, reverted CREATE_IMAGE to OpenAI-only
  • src/sanzaru/server.py — conditional create_image_google registration, reverted create_image registration
  • src/sanzaru/config.pyget_google_client() with Vertex AI Express, ADC, and Gemini Developer API support
  • src/sanzaru/features.pycheck_google_available() for credential detection
  • pyproject.tomlgoogle optional dependency group (google-genai)
  • .mcp.json — Google env var passthrough

Test plan

  • ruff check + ruff format — all clean
  • mypy — passes
  • pytest — 264 passed, 1 skipped
  • E2E: create_image (OpenAI) — queued and completed
  • E2E: create_image_google (Vertex AI Express) — generated image successfully
  • E2E: Both tools called in parallel — both succeed
  • Pre-commit hooks pass on all commits

🤖 Generated with Claude Code

claude and others added 6 commits February 28, 2026 14:06
Integrates Google Nano Banana (Gemini image generation models) into the
existing create_image tool via a new `provider` parameter, using ADC-based
auth auto-detection that covers Vertex AI, service account keys, and gcloud.

Changes:
- pyproject.toml: add `google` optional extra (google-genai>=0.8.0)
- config.py: add get_google_client() with ADC auto-detect — Vertex AI
  (GOOGLE_GENAI_USE_VERTEXAI + GOOGLE_CLOUD_PROJECT) or Gemini Developer
  API (GOOGLE_API_KEY), no explicit credential loading required
- features.py: add check_google_available() + include in get_available_features()
- tools/image.py: add _create_image_google() helper (sync client wrapped in
  anyio thread pool) and dispatch from create_image() on provider="google";
  Google path returns ImageDownloadResult immediately (no polling required)
- descriptions.py: rewrite CREATE_IMAGE to document both providers, model IDs
  (Nano Banana 2/Pro/v1), and the synchronous vs async return shapes
- server.py: add provider, aspect_ratio, filename params to create_image tool

Google Nano Banana models supported:
- gemini-3.1-flash-image-preview (Nano Banana 2, default)
- gemini-3-pro-image-preview (Nano Banana Pro)
- gemini-2.5-flash-image (Nano Banana)

https://claude.ai/code/session_01CVTKuW7q1AVoN4PQfhgF9K
Relaxes the GOOGLE_CLOUD_PROJECT requirement when GOOGLE_GENAI_USE_VERTEXAI=True
to also allow Vertex AI Express mode (paid tier), where a Google Cloud API key
can be used instead of or alongside ADC credentials.

Auth resolution for GOOGLE_GENAI_USE_VERTEXAI=True:
- GOOGLE_CLOUD_PROJECT only → ADC path (service account, gcloud, attached SA)
- GOOGLE_API_KEY only → Vertex Express mode
- Both set → Express mode with explicit project context
- Neither set → RuntimeError with clear guidance

check_google_available() updated to mirror the same three-way logic.

https://claude.ai/code/session_01CVTKuW7q1AVoN4PQfhgF9K
Nano Banana models are Gemini models, not Imagen — they use
generate_content() with IMAGE response modality, not generate_images().

Changes:
- tools/image.py: rewrite _create_image_google to use generate_content()
  with GenerateContentConfig, ImageConfig, ThinkingConfig, and SafetySetting;
  extract image from response.candidates[].content.parts[].inline_data;
  add thinking_config (HIGH) for Nano Banana 2 (Flash-based);
  default all safety settings to OFF
- Add Literal types: GoogleImageModel, GoogleImageSize, GoogleAspectRatio
  for type-safe parameter validation at the MCP tool boundary
- Use proper google.genai enum types (HarmCategory, HarmBlockThreshold,
  ThinkingLevel) for mypy compliance
- config.py: fix Vertex AI Express — api_key and project/location are
  mutually exclusive in the SDK Client initializer
- descriptions.py: document image_size, safety_settings params
- .mcp.json: add Google env vars (GOOGLE_API_KEY, GOOGLE_GENAI_USE_VERTEXAI,
  GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION)
- uv.lock: lock google-genai dependency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…age support

Separate Google Nano Banana into its own `create_image_google` tool, restoring
`create_image` to a clean OpenAI-only signature. The new tool accepts up to 14
reference images via `input_images` parameter for editing, style transfer, and
multi-image composition. Conditionally registered only when Google is configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ImageConfig.output_mime_type parameter is only supported on Vertex AI,
not the Gemini Developer API. Conditionally set it based on the
GOOGLE_GENAI_USE_VERTEXAI environment variable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Mar 8, 2026

test

@claude
Copy link
Copy Markdown

claude bot commented Mar 8, 2026

Posting full review in follow-up comment due to formatting constraints.

@claude
Copy link
Copy Markdown

claude bot commented Mar 8, 2026

Part 1/5 - Code Review: PR 25

Overall this is a clean, well-structured addition. The conditional registration, thread-pool wrapping of the sync API, and storage-backend integration all follow existing patterns well. A few things worth addressing before merge.

@claude
Copy link
Copy Markdown

claude bot commented Mar 8, 2026

Part 2/5 - Bugs and Correctness

  1. _get_dimensions closure captures un-narrowed type (tools/image.py approx line 605): The closure captures image_bytes by reference with its declared type of bytes-or-None. mypy may not narrow this inside a nested function even though the is-None guard already fired. Safe fix: rebind after the guard with safe_bytes: bytes = image_bytes and use that inside the closure.

  2. Mutating ImageConfig after construction (tools/image.py approx line 527): image_cfg.output_mime_type = image/png is assigned after construction. If ImageConfig is a frozen Pydantic model this silently fails or raises at runtime. Prefer constructing with the field conditionally included using a dict spread pattern.

@claude
Copy link
Copy Markdown

claude bot commented Mar 8, 2026

Code Review: PR 25 - Google Nano Banana image generation

Overall this is a clean, well-structured addition. The conditional registration, thread-pool wrapping of the sync API, and storage-backend integration all follow existing patterns well. A few things worth addressing before merge:


Bugs and Correctness

1. _get_dimensions closure captures un-narrowed type (tools/image.py ~line 605)

The closure captures image_bytes by reference with its declared type of bytes | None. mypy may not narrow this inside a nested function even though the is None guard already fired. Safe fix: rebind after the guard with safe_bytes: bytes = image_bytes and use that inside the closure.

2. Mutating ImageConfig after construction (tools/image.py ~line 527)

image_cfg.output_mime_type = "image/png" is assigned after construction. If ImageConfig is a frozen Pydantic model this silently fails or raises at runtime. Prefer constructing with the field conditionally included using a dict spread pattern.


Type Safety

3. Missing return type on get_google_client() (config.py line 45)

The rest of the codebase is fully type-annotated. The conditional import makes this tricky but TYPE_CHECKING handles it cleanly — declare GoogleClient under TYPE_CHECKING and annotate the return as a string forward reference.

4. safety_settings: list[dict[str, str]] should use a TypedDict

CLAUDE.md says "NEVER use typing.Any" and the codebase consistently uses TypedDict for structured dicts. A SafetySettingDict(TypedDict) with category: str and threshold: str fields is the right pattern here.


Minor Issues

5. _GOOGLE_DEFAULT_MODEL constant is defined but never referenced

Declared but the default value is inlined directly in the function signature and GoogleImageModel Literal. Either use model: GoogleImageModel = _GOOGLE_DEFAULT_MODEL or remove the constant.

6. Misleading log in check_google_available() (features.py line 318)

When both project and api_key are set it logs "Vertex AI Express (project=..., api_key)" but in get_google_client() the api_key path wins and project is ignored. Logging (api_key only) in that branch would match actual runtime behavior.

7. Lower bound google-genai>=0.8.0 is very permissive

The lock file resolves to 1.66.0. APIs used here (ThinkingConfig, ThinkingLevel, ImageConfig.image_size) almost certainly did not exist in 0.8.x. Tighten to >=1.0.0 or higher to prevent confusing import errors on older installs.

8. Async client available

The SDK exposes client.aio.models.generate_content() for native async. The current thread-pool approach is correct, but the async variant avoids a thread hop entirely. Not a blocker — worth knowing for a future cleanup.


Test Coverage

No new unit or integration tests were added for create_image_google, get_google_client, or check_google_available. The existing suite (264 passed) could not have exercised these code paths. At minimum, tests for:

  • check_google_available() with mocked env vars (Vertex ADC, Express, Gemini Dev, missing creds)
  • create_image_google() validation paths (too many images, unsupported extension, no image in response)
  • get_google_client() credential selection branches

would match the coverage level already in place for the rest of the image tools.


Docs

Consider a short section in CLAUDE.md under Environment Configuration for the Google env vars (similar to the existing DATABRICKS_* block), so contributors know how to activate the feature locally.


Summary: Items 1-4 are worth fixing before merge (correctness and type safety). Items 5-8 are quality improvements. The test coverage gap is the most notable structural concern for long-term maintainability.

Generated with Claude Code

- Add pillow>=12.0.0 to the google optional dependency group since
  create_image_google uses PIL for reference images and dimensions
- Remove "auto" from GoogleAspectRatio — not documented in the SDK
  (keep 4:5/5:4 which appear in some Google docs)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Mar 9, 2026

Code Review — PR #25: Google Nano Banana image generation

Overall this is a well-structured addition. The conditional registration pattern, pluggable auth detection, and async wrapping with anyio.to_thread.run_sync are all consistent with the existing codebase conventions. A few things worth addressing:


Issues

1. Missing return type annotation on get_google_client() (config.py)

def get_google_client():   # ← no return type

CLAUDE.md explicitly prohibits typing.Any and the codebase is fully type-annotated. The return type should be annotated. Since google-genai is an optional dep, this can be done with a TYPE_CHECKING guard:

from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from google.genai import Client as GoogleClient

def get_google_client() -> "GoogleClient":

Or alternatively, use -> object as a pragmatic fallback if the conditional import is unwieldy — but the missing annotation should be resolved one way or another.


2. Minimum version pinned too low (pyproject.toml)

google-genai>=0.8.0

The locked version is 1.66.0. The code uses ThinkingConfig, ThinkingLevel, ImageConfig, and image_size — none of which existed in 0.8.0. This would allow uv add 'sanzaru[google]' to resolve a much older version that would fail at runtime. Consider tightening the bound to something like >=1.0.0 or even >=1.50.0 based on when ThinkingConfig landed.


3. finish_reason not checked before extracting image (tools/image.py)

When generation is blocked (e.g. SAFETY), response.candidates[0].content.parts may be empty or missing, producing the generic error:

"Google Nano Banana returned no image — prompt may have been blocked by safety filters"

That message is a good guess, but the actual finish_reason is available on the candidate and would make debugging far easier. Suggest adding:

if response.candidates:
    finish_reason = response.candidates[0].finish_reason
    if finish_reason and finish_reason.name != "STOP":
        raise ValueError(f"Generation stopped: {finish_reason.name}. Prompt may have triggered safety filters.")

4. No tests for create_image_google

The existing test suite covers video, OpenAI image tools, and reference tools. This PR adds a new tool with non-trivial logic (multimodal input assembly, safety settings construction, Vertex vs. Gemini API branching) but no corresponding tests. Given the project's 80%+ tools coverage goal, at minimum an integration test with a mocked google_client.models.generate_content would be valuable — especially for the multimodal path and the finish_reason/empty-image error paths.


Minor Observations

5. Post-construction mutation of GenerateContentConfig

config = genai_types.GenerateContentConfig(...)
if model in _THINKING_MODELS:
    config.thinking_config = genai_types.ThinkingConfig(...)  # mutation after construction

Same pattern for image_cfg.output_mime_type. These work fine if the SDK uses mutable dataclasses/Pydantic models, but constructing inline with a conditional is more idiomatic and avoids confusion:

config = genai_types.GenerateContentConfig(
    ...,
    thinking_config=genai_types.ThinkingConfig(...) if model in _THINKING_MODELS else None,
)

6. Redundant import alias in server.py

from .tools import image as _image_google

Since the condition is check_google_available() and check_image_available(), the earlier if check_image_available(): block will have already imported image into module scope. The alias just adds noise — image.create_image_google(...) would work and is consistent with how other tools in server.py call into image.*.


7. PIL images not explicitly closed in the reference image loop

pil_img = await anyio.to_thread.run_sync(_open_image)
pil_images.append(pil_img)

PIL Image objects hold file handles until GC collects them. With up to 14 large images, it's worth explicitly calling pil_img.close() or using a context manager after the images are passed to the SDK, to avoid holding those handles open longer than needed.


8. Design note: safety settings default to all OFF

All four safety categories are disabled by default for "maximum creative freedom." This is called out in the tool description so users are informed, but it's worth confirming this is the intended production posture — especially if this server is exposed in multi-tenant or enterprise contexts.


What's working well

  • The conditional registration pattern (check_google_available() and check_image_available()) is clean and consistent with how audio/databricks tools are gated.
  • Wrapping the synchronous Google SDK call in anyio.to_thread.run_sync is correct and avoids blocking the event loop.
  • The _open_image(data: bytes = img_bytes) default-argument capture pattern correctly avoids the classic loop-closure variable capture bug.
  • Auth detection in check_google_available() and get_google_client() is thorough and handles all three auth modes (ADC, Express, Gemini Developer API) consistently.
  • Refactoring _ALLOWED_IMAGE_EXTENSIONS into a shared constant and reusing it in both create_image and create_image_google is a nice cleanup.
  • The CLAUDE.md update to cover Google env vars in .mcp.json is appreciated.

Summary: Items 1 (missing type annotation), 2 (version floor too low), and 4 (missing tests) are the most important to address before merging. Items 3–7 are improvements but less critical.

- Fix _get_dimensions closure type narrowing: rebind image_bytes to
  safe_bytes after None guard so closures see `bytes` not `bytes | None`
- Add SafetySettingDict TypedDict replacing raw dict[str, str]
- Add return type annotation to get_google_client() with TYPE_CHECKING
- Remove unused _GOOGLE_DEFAULT_MODEL constant
- Fix misleading log when both project+api_key set (api_key wins)
- Tighten google-genai lower bound to >=1.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Mar 9, 2026

Code Review

Overall this is a clean, well-structured addition. The conditional registration pattern, TYPE_CHECKING guard, and anyio.to_thread.run_sync wrapping for the synchronous Google SDK are all solid. A few items worth addressing before merge:


Bugs / Potential Runtime Errors

1. Mutating GenerateContentConfig after construction (tools/image.py ~line 555–563)

image_cfg = genai_types.ImageConfig(aspect_ratio=aspect_ratio, image_size=image_size)
if use_vertex:
    image_cfg.output_mime_type = "image/png"   # mutation

config = genai_types.GenerateContentConfig(...)
if model in _THINKING_MODELS:
    config.thinking_config = ...               # mutation

If google-genai uses frozen Pydantic models (which many SDK types do), these attribute assignments will raise ValidationError at runtime. Better to build the full config in one shot:

image_cfg = genai_types.ImageConfig(
    aspect_ratio=aspect_ratio,
    image_size=image_size,
    **({"output_mime_type": "image/png"} if use_vertex else {}),
)
thinking = genai_types.ThinkingConfig(thinking_level=genai_types.ThinkingLevel.HIGH) if model in _THINKING_MODELS else None
config = genai_types.GenerateContentConfig(
    response_modalities=["IMAGE", "TEXT"],
    safety_settings=typed_safety,
    image_config=image_cfg,
    **({"thinking_config": thinking} if thinking else {}),
)

This is worth verifying against the SDK — if the models are mutable the current code is fine, but it's fragile.


Design / Architecture

2. GOOGLE_GENAI_USE_VERTEXAI checked in both config.py and tools/image.py

The use_vertex env-var check appears in both get_google_client() (auth routing) and create_image_google() (to decide output_mime_type). This leaks transport/auth concerns into the tool layer. Consider exposing a helper from the config layer:

# config.py
def is_vertex_ai() -> bool:
    return os.getenv("GOOGLE_GENAI_USE_VERTEXAI", "").lower() in ("true", "1")

3. Generated image is written to the "reference" path, not an "image" output path

await storage.write("reference", filename, safe_bytes)

This is intentional for the Sora workflow (generate → use as reference), but it's not obvious. A comment explaining why the output goes to the reference path (rather than images/) would help future maintainers, and the tool description could mention it so users know where to find their output.


Security / Policy

4. Safety settings default to all OFF

_DEFAULT_SAFETY_OFF: list[SafetySettingDict] = [
    {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "OFF"},
    {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "OFF"},
    {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "OFF"},
    {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"},
]

Defaulting all harm categories to OFF is a significant policy choice that should be explicitly called out in the tool description (it currently isn't). Users deploying this in a shared/multi-tenant context may not realize safety filtering is bypassed by default. At minimum, the CREATE_IMAGE_GOOGLE description should note this, and consider whether the default should be more conservative (e.g., BLOCK_MEDIUM_AND_ABOVE) with an explicit opt-out.


Type Safety

5. SafetySettingDict uses plain str instead of Literal types (types.py)

class SafetySettingDict(TypedDict):
    category: str
    threshold: str

Given the project's emphasis on type safety, these could be narrowed:

HarmCategory = Literal[
    "HARM_CATEGORY_HATE_SPEECH",
    "HARM_CATEGORY_DANGEROUS_CONTENT",
    "HARM_CATEGORY_SEXUALLY_EXPLICIT",
    "HARM_CATEGORY_HARASSMENT",
]
HarmThreshold = Literal["OFF", "BLOCK_LOW_AND_ABOVE", "BLOCK_MEDIUM_AND_ABOVE", "BLOCK_HIGH_AND_ABOVE"]

class SafetySettingDict(TypedDict):
    category: HarmCategory
    threshold: HarmThreshold

This also makes the description's valid values self-documenting at the type level.


Test Coverage

6. No new tests for create_image_google

The PR passes existing tests (264 passed) but adds no new tests for the Google tool. Per the project's coverage goals (80%+ for tool business logic), an integration test with a mocked google_client would be appropriate — at minimum covering: text-only generation, reference image input, safety settings passthrough, and the "no image returned" error path. The check_google_available() logic in features.py is also untested.


Minor

7. img.format mismatch with auto-generated filename

filename = generate_filename("nb", "png", use_timestamp=True)   # always .png
# ...
fmt = img.format.lower() if img.format else "png"               # could be JPEG on Gemini Dev API

When output_mime_type is not set (Gemini Developer API path), the model may return a JPEG. The file would be named nb_....png but contain JPEG data. Setting output_mime_type for both paths (if the SDK supports it on Gemini Dev API) would eliminate the ambiguity, or at least detect and use the correct extension in the filename.

8. check_image_available() guard in server.py is redundant

if check_google_available() and check_image_available():

Since sanzaru[google] already depends on pillow, check_image_available() will always be True when the google extra is installed. This is harmless but adds noise.


Summary

Severity Item
🔴 Potential crash Frozen Pydantic model mutation (#1)
🟠 Policy concern Safety OFF by default undocumented (#4)
🟡 Design use_vertex check in tool layer (#2)
🟡 Coverage No tests for new tool (#6)
🟡 Type safety SafetySettingDict uses str (#5)
🟢 Minor "reference" path needs comment (#3), filename/format mismatch (#7), redundant guard (#8)

The core approach is sound. Resolving #1 and #4 before merge would be my top recommendations.

The SDK's generate_content parameter type includes Dict variants that
don't align with its own ContentListUnion type. Add targeted type:
ignore since PIL.Image is accepted at runtime via PartUnion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Mar 9, 2026

Code Review: PR 25 - Google Nano Banana image generation Overall this is a well-structured addition.

@arcaputo3 arcaputo3 merged commit dce9db5 into main Mar 9, 2026
8 checks passed
@arcaputo3 arcaputo3 deleted the claude/add-vertex-ai-auth-5hrqd branch March 9, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants