feat: add Google Nano Banana image generation with Vertex AI auth#25
feat: add Google Nano Banana image generation with Vertex AI auth#25
Conversation
Integrates Google Nano Banana (Gemini image generation models) into the existing create_image tool via a new `provider` parameter, using ADC-based auth auto-detection that covers Vertex AI, service account keys, and gcloud. Changes: - pyproject.toml: add `google` optional extra (google-genai>=0.8.0) - config.py: add get_google_client() with ADC auto-detect — Vertex AI (GOOGLE_GENAI_USE_VERTEXAI + GOOGLE_CLOUD_PROJECT) or Gemini Developer API (GOOGLE_API_KEY), no explicit credential loading required - features.py: add check_google_available() + include in get_available_features() - tools/image.py: add _create_image_google() helper (sync client wrapped in anyio thread pool) and dispatch from create_image() on provider="google"; Google path returns ImageDownloadResult immediately (no polling required) - descriptions.py: rewrite CREATE_IMAGE to document both providers, model IDs (Nano Banana 2/Pro/v1), and the synchronous vs async return shapes - server.py: add provider, aspect_ratio, filename params to create_image tool Google Nano Banana models supported: - gemini-3.1-flash-image-preview (Nano Banana 2, default) - gemini-3-pro-image-preview (Nano Banana Pro) - gemini-2.5-flash-image (Nano Banana) https://claude.ai/code/session_01CVTKuW7q1AVoN4PQfhgF9K
Relaxes the GOOGLE_CLOUD_PROJECT requirement when GOOGLE_GENAI_USE_VERTEXAI=True to also allow Vertex AI Express mode (paid tier), where a Google Cloud API key can be used instead of or alongside ADC credentials. Auth resolution for GOOGLE_GENAI_USE_VERTEXAI=True: - GOOGLE_CLOUD_PROJECT only → ADC path (service account, gcloud, attached SA) - GOOGLE_API_KEY only → Vertex Express mode - Both set → Express mode with explicit project context - Neither set → RuntimeError with clear guidance check_google_available() updated to mirror the same three-way logic. https://claude.ai/code/session_01CVTKuW7q1AVoN4PQfhgF9K
Nano Banana models are Gemini models, not Imagen — they use generate_content() with IMAGE response modality, not generate_images(). Changes: - tools/image.py: rewrite _create_image_google to use generate_content() with GenerateContentConfig, ImageConfig, ThinkingConfig, and SafetySetting; extract image from response.candidates[].content.parts[].inline_data; add thinking_config (HIGH) for Nano Banana 2 (Flash-based); default all safety settings to OFF - Add Literal types: GoogleImageModel, GoogleImageSize, GoogleAspectRatio for type-safe parameter validation at the MCP tool boundary - Use proper google.genai enum types (HarmCategory, HarmBlockThreshold, ThinkingLevel) for mypy compliance - config.py: fix Vertex AI Express — api_key and project/location are mutually exclusive in the SDK Client initializer - descriptions.py: document image_size, safety_settings params - .mcp.json: add Google env vars (GOOGLE_API_KEY, GOOGLE_GENAI_USE_VERTEXAI, GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION) - uv.lock: lock google-genai dependency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…age support Separate Google Nano Banana into its own `create_image_google` tool, restoring `create_image` to a clean OpenAI-only signature. The new tool accepts up to 14 reference images via `input_images` parameter for editing, style transfer, and multi-image composition. Conditionally registered only when Google is configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ImageConfig.output_mime_type parameter is only supported on Vertex AI, not the Gemini Developer API. Conditionally set it based on the GOOGLE_GENAI_USE_VERTEXAI environment variable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
test |
|
Posting full review in follow-up comment due to formatting constraints. |
|
Part 1/5 - Code Review: PR 25 Overall this is a clean, well-structured addition. The conditional registration, thread-pool wrapping of the sync API, and storage-backend integration all follow existing patterns well. A few things worth addressing before merge. |
|
Part 2/5 - Bugs and Correctness
|
Code Review: PR 25 - Google Nano Banana image generationOverall this is a clean, well-structured addition. The conditional registration, thread-pool wrapping of the sync API, and storage-backend integration all follow existing patterns well. A few things worth addressing before merge: Bugs and Correctness1. The closure captures 2. Mutating
Type Safety3. Missing return type on The rest of the codebase is fully type-annotated. The conditional import makes this tricky but 4. CLAUDE.md says "NEVER use Minor Issues5. Declared but the default value is inlined directly in the function signature and 6. Misleading log in When both 7. Lower bound The lock file resolves to 8. Async client available The SDK exposes Test CoverageNo new unit or integration tests were added for
would match the coverage level already in place for the rest of the image tools. DocsConsider a short section in CLAUDE.md under Environment Configuration for the Google env vars (similar to the existing Summary: Items 1-4 are worth fixing before merge (correctness and type safety). Items 5-8 are quality improvements. The test coverage gap is the most notable structural concern for long-term maintainability. Generated with Claude Code |
- Add pillow>=12.0.0 to the google optional dependency group since create_image_google uses PIL for reference images and dimensions - Remove "auto" from GoogleAspectRatio — not documented in the SDK (keep 4:5/5:4 which appear in some Google docs) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code Review — PR #25: Google Nano Banana image generationOverall this is a well-structured addition. The conditional registration pattern, pluggable auth detection, and async wrapping with Issues1. Missing return type annotation on def get_google_client(): # ← no return typeCLAUDE.md explicitly prohibits from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from google.genai import Client as GoogleClient
def get_google_client() -> "GoogleClient":Or alternatively, use 2. Minimum version pinned too low ( google-genai>=0.8.0The locked version is 3. When generation is blocked (e.g. SAFETY),
That message is a good guess, but the actual if response.candidates:
finish_reason = response.candidates[0].finish_reason
if finish_reason and finish_reason.name != "STOP":
raise ValueError(f"Generation stopped: {finish_reason.name}. Prompt may have triggered safety filters.")4. No tests for The existing test suite covers video, OpenAI image tools, and reference tools. This PR adds a new tool with non-trivial logic (multimodal input assembly, safety settings construction, Vertex vs. Gemini API branching) but no corresponding tests. Given the project's 80%+ tools coverage goal, at minimum an integration test with a mocked Minor Observations5. Post-construction mutation of config = genai_types.GenerateContentConfig(...)
if model in _THINKING_MODELS:
config.thinking_config = genai_types.ThinkingConfig(...) # mutation after constructionSame pattern for config = genai_types.GenerateContentConfig(
...,
thinking_config=genai_types.ThinkingConfig(...) if model in _THINKING_MODELS else None,
)6. Redundant import alias in from .tools import image as _image_googleSince the condition is 7. PIL images not explicitly closed in the reference image loop pil_img = await anyio.to_thread.run_sync(_open_image)
pil_images.append(pil_img)PIL 8. Design note: safety settings default to all OFF All four safety categories are disabled by default for "maximum creative freedom." This is called out in the tool description so users are informed, but it's worth confirming this is the intended production posture — especially if this server is exposed in multi-tenant or enterprise contexts. What's working well
Summary: Items 1 (missing type annotation), 2 (version floor too low), and 4 (missing tests) are the most important to address before merging. Items 3–7 are improvements but less critical. |
- Fix _get_dimensions closure type narrowing: rebind image_bytes to safe_bytes after None guard so closures see `bytes` not `bytes | None` - Add SafetySettingDict TypedDict replacing raw dict[str, str] - Add return type annotation to get_google_client() with TYPE_CHECKING - Remove unused _GOOGLE_DEFAULT_MODEL constant - Fix misleading log when both project+api_key set (api_key wins) - Tighten google-genai lower bound to >=1.0.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code ReviewOverall this is a clean, well-structured addition. The conditional registration pattern, Bugs / Potential Runtime Errors1. Mutating image_cfg = genai_types.ImageConfig(aspect_ratio=aspect_ratio, image_size=image_size)
if use_vertex:
image_cfg.output_mime_type = "image/png" # mutation
config = genai_types.GenerateContentConfig(...)
if model in _THINKING_MODELS:
config.thinking_config = ... # mutationIf image_cfg = genai_types.ImageConfig(
aspect_ratio=aspect_ratio,
image_size=image_size,
**({"output_mime_type": "image/png"} if use_vertex else {}),
)
thinking = genai_types.ThinkingConfig(thinking_level=genai_types.ThinkingLevel.HIGH) if model in _THINKING_MODELS else None
config = genai_types.GenerateContentConfig(
response_modalities=["IMAGE", "TEXT"],
safety_settings=typed_safety,
image_config=image_cfg,
**({"thinking_config": thinking} if thinking else {}),
)This is worth verifying against the SDK — if the models are mutable the current code is fine, but it's fragile. Design / Architecture2. The # config.py
def is_vertex_ai() -> bool:
return os.getenv("GOOGLE_GENAI_USE_VERTEXAI", "").lower() in ("true", "1")3. Generated image is written to the await storage.write("reference", filename, safe_bytes)This is intentional for the Sora workflow (generate → use as reference), but it's not obvious. A comment explaining why the output goes to the reference path (rather than images/) would help future maintainers, and the tool description could mention it so users know where to find their output. Security / Policy4. Safety settings default to all OFF _DEFAULT_SAFETY_OFF: list[SafetySettingDict] = [
{"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "OFF"},
{"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "OFF"},
{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "OFF"},
{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"},
]Defaulting all harm categories to OFF is a significant policy choice that should be explicitly called out in the tool description (it currently isn't). Users deploying this in a shared/multi-tenant context may not realize safety filtering is bypassed by default. At minimum, the Type Safety5. class SafetySettingDict(TypedDict):
category: str
threshold: strGiven the project's emphasis on type safety, these could be narrowed: HarmCategory = Literal[
"HARM_CATEGORY_HATE_SPEECH",
"HARM_CATEGORY_DANGEROUS_CONTENT",
"HARM_CATEGORY_SEXUALLY_EXPLICIT",
"HARM_CATEGORY_HARASSMENT",
]
HarmThreshold = Literal["OFF", "BLOCK_LOW_AND_ABOVE", "BLOCK_MEDIUM_AND_ABOVE", "BLOCK_HIGH_AND_ABOVE"]
class SafetySettingDict(TypedDict):
category: HarmCategory
threshold: HarmThresholdThis also makes the description's valid values self-documenting at the type level. Test Coverage6. No new tests for The PR passes existing tests (264 passed) but adds no new tests for the Google tool. Per the project's coverage goals (80%+ for tool business logic), an integration test with a mocked Minor7. filename = generate_filename("nb", "png", use_timestamp=True) # always .png
# ...
fmt = img.format.lower() if img.format else "png" # could be JPEG on Gemini Dev APIWhen 8. if check_google_available() and check_image_available():Since Summary
The core approach is sound. Resolving #1 and #4 before merge would be my top recommendations. |
The SDK's generate_content parameter type includes Dict variants that don't align with its own ContentListUnion type. Add targeted type: ignore since PIL.Image is accepted at runtime via PartUnion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Code Review: PR 25 - Google Nano Banana image generation Overall this is a well-structured addition. |
Summary
create_image_googletool with reference image support (up to 14 images via multimodalgenerate_content()input)create_imageto a clean OpenAI-only signature — no more "OpenAI only" / "Google only" param cluttercreate_image_googleonly when google-genai is installed and credentials are configuredoutput_mime_typecompatibility: only set on Vertex AI (not supported on Gemini Developer API)Changes
src/sanzaru/tools/image.py— publiccreate_image_google()withinput_images, revertedcreate_image()to OpenAI-onlysrc/sanzaru/descriptions.py— dedicatedCREATE_IMAGE_GOOGLEdescription, revertedCREATE_IMAGEto OpenAI-onlysrc/sanzaru/server.py— conditionalcreate_image_googleregistration, revertedcreate_imageregistrationsrc/sanzaru/config.py—get_google_client()with Vertex AI Express, ADC, and Gemini Developer API supportsrc/sanzaru/features.py—check_google_available()for credential detectionpyproject.toml—googleoptional dependency group (google-genai).mcp.json— Google env var passthroughTest plan
ruff check+ruff format— all cleanmypy— passespytest— 264 passed, 1 skippedcreate_image(OpenAI) — queued and completedcreate_image_google(Vertex AI Express) — generated image successfully🤖 Generated with Claude Code