Skip to content

tests: make Gemini callback coverage resilient to model deprecations#78

Open
shuofengzhang wants to merge 1 commit intogadievron:mainfrom
shuofengzhang:fix/gemini-callback-model-availability
Open

tests: make Gemini callback coverage resilient to model deprecations#78
shuofengzhang wants to merge 1 commit intogadievron:mainfrom
shuofengzhang:fix/gemini-callback-model-availability

Conversation

@shuofengzhang
Copy link

What changed

  • Made Gemini callback tests resilient to model-name churn by introducing a configurable test model:
    • Added GEMINI_TEST_MODEL env override (default: gemini-2.0-flash).
    • Updated Gemini callback test to use GEMINI_TEST_MODEL instead of hardcoded gemini-2.0-flash-exp.
  • Added explicit handling for Gemini "model not found" runtime errors:
    • If the configured Gemini model is unavailable for the current API key/account, the test now skips with a clear message instead of failing the whole suite.
  • Updated provider compatibility summary test to use the same GEMINI_TEST_MODEL value for consistency.

Why

  • The previous test hardcoded a deprecated/unavailable Gemini model (gemini-2.0-flash-exp).
  • On environments where GEMINI_API_KEY is set, this caused a deterministic false-negative test failure unrelated to callback behavior.
  • Callback tests should validate callback wiring, not fail due external provider model lifecycle changes.

Insight / Why this matters

  • Root cause: provider-facing integration test coupled callback verification to a brittle, hardcoded model identifier.
  • Why it is easy to miss: the Gemini test is skipped when GEMINI_API_KEY is absent, so many contributors never see the failure locally.
  • Tradeoff: skipping on explicit "model not found" errors favors signal quality over strict model pinning. This keeps callback tests meaningful while still surfacing real callback regressions.
  • Impact: contributors with valid Gemini keys no longer get blocked by provider model deprecations; CI/local runs become more reliable and less noisy.

Practical gain / Why this matters

  • Immediate reduction in flaky/false-negative failures for Gemini-enabled environments.
  • Faster contributor feedback loops: real callback regressions remain visible, while provider catalog churn no longer breaks unrelated test intent.
  • Maintainer benefit: lower triage overhead from failures caused by external model availability rather than project logic.

Testing

  • Baseline before change:
    • scripts/clone_and_test.sh gadievron/raptor
    • Result: 1 failed, 481 passed, 14 skipped (failure in test_gemini_callback due unavailable Gemini model name).
  • Focused validation after change:
    • pytest -q packages/llm_analysis/tests/test_llm_callbacks_providers.py
    • Result: 1 passed, 4 skipped.
  • Full suite after change:
    • scripts/clone_and_test.sh gadievron/raptor
    • Result: 481 passed, 15 skipped.

Risk analysis

  • Scope is test-only (packages/llm_analysis/tests/test_llm_callbacks_providers.py), with no runtime/application behavior changes.
  • Behavior change is narrow: only converts known model-availability mismatch into an explicit skip for Gemini callback test paths.
  • Rollback-safe: reverting this commit restores prior behavior without affecting production code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant