tests: make Gemini callback coverage resilient to model deprecations by shuofengzhang · Pull Request #78 · gadievron/raptor

shuofengzhang · 2026-03-20T13:39:28Z

What changed

Made Gemini callback tests resilient to model-name churn by introducing a configurable test model:
- Added GEMINI_TEST_MODEL env override (default: gemini-2.0-flash).
- Updated Gemini callback test to use GEMINI_TEST_MODEL instead of hardcoded gemini-2.0-flash-exp.
Added explicit handling for Gemini "model not found" runtime errors:
- If the configured Gemini model is unavailable for the current API key/account, the test now skips with a clear message instead of failing the whole suite.
Updated provider compatibility summary test to use the same GEMINI_TEST_MODEL value for consistency.

The previous test hardcoded a deprecated/unavailable Gemini model (gemini-2.0-flash-exp).
On environments where GEMINI_API_KEY is set, this caused a deterministic false-negative test failure unrelated to callback behavior.
Callback tests should validate callback wiring, not fail due external provider model lifecycle changes.

Root cause: provider-facing integration test coupled callback verification to a brittle, hardcoded model identifier.
Why it is easy to miss: the Gemini test is skipped when GEMINI_API_KEY is absent, so many contributors never see the failure locally.
Tradeoff: skipping on explicit "model not found" errors favors signal quality over strict model pinning. This keeps callback tests meaningful while still surfacing real callback regressions.
Impact: contributors with valid Gemini keys no longer get blocked by provider model deprecations; CI/local runs become more reliable and less noisy.

Immediate reduction in flaky/false-negative failures for Gemini-enabled environments.
Faster contributor feedback loops: real callback regressions remain visible, while provider catalog churn no longer breaks unrelated test intent.
Maintainer benefit: lower triage overhead from failures caused by external model availability rather than project logic.

Baseline before change:
- scripts/clone_and_test.sh gadievron/raptor
- Result: 1 failed, 481 passed, 14 skipped (failure in test_gemini_callback due unavailable Gemini model name).
Focused validation after change:
- pytest -q packages/llm_analysis/tests/test_llm_callbacks_providers.py
- Result: 1 passed, 4 skipped.
Full suite after change:
- scripts/clone_and_test.sh gadievron/raptor
- Result: 481 passed, 15 skipped.

Scope is test-only (packages/llm_analysis/tests/test_llm_callbacks_providers.py), with no runtime/application behavior changes.
Behavior change is narrow: only converts known model-availability mismatch into an explicit skip for Gemini callback test paths.
Rollback-safe: reverting this commit restores prior behavior without affecting production code.

tests: make gemini callback test resilient to model deprecations

949914a