-
Notifications
You must be signed in to change notification settings - Fork 128
Circuit breaker changes using pybreaker #705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
|
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
c83cba7 to
e7e8b4b
Compare
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
Signed-off-by: Nikhil Suri <[email protected]>
samikshya-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. I have some concerns with the unit/e2e tests added : added comments around it. Please push the PR once all changes are made. Thanks.
|
|
||
|
|
||
| @pytest.fixture(autouse=True) | ||
| def aggressive_circuit_breaker_config(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spent some time checking the unit tests and this e2e test. This is not adding something that we are not already testing tbh - or am I missing something? In which case we can remove this altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, this E2E test verifies the full integration path from SQL query execution through telemetry generation to circuit breaker enforcement with real Databricks connections. It ensures that when the circuit breaker opens after rate-limit errors, subsequent telemetry requests are actually blocked from hitting Databricks, and that circuit recovery works correctly with real timing.
Can you elaborate why you think the tests are not adding anything to the existing test coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only telemetry client is being mocked :
[c1] we're intercepting calls before they ever reach the circuit breaker layer. This means none of the responses are occurring naturally. This means we are testing mock behavior.
Circuit breaker transitions are already tested in test_circuit_breaker_manager.py, rate limit and error propogation are also tested in unit tests. I do not see a single use-case not being already tested via unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the unit tests - they test circuit breaker logic in isolation with mock functions (breaker.call(failing_func)), and telemetry client with direct API calls (_send_with_unified_client()). None of them execute real SQL queries (cursor.execute("SELECT 1")) or verify that telemetry is actually generated during query execution. The E2E test is the only place we validate the full journey: Real Query → Telemetry Generation → Circuit Breaker Enforcement. It also does not test that Circuit breaker state persists across multiple real SQL queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we already have an e2e for telemetry with real queries
test_concurrent_telemetry.py - What state transitions are you talking about other than ones in
test_circuit_breaker_manager.py?
I mean, i am ok with adding the e2e - I just don't want any redundant code in our repo - as long that it is not the case - I am good tbh. I have already approved the PR anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could generate a summary of use cases that will not be tested if we remove these test cases
# SCENARIO: You remove test_circuit_breaker.py
# What could break without detection:
1. Bug: Circuit breaker not invoked during SQL query execution
- test_concurrent_telemetry.py: Won't catch (no circuit breaker testing)
- test_circuit_breaker_manager.py: Won't catch (mocks, no real queries)
- Result: Circuit breaker silently broken in production ❌
2. Bug: Circuit breaker opens but telemetry still sent
- test_concurrent_telemetry.py: Won't catch (no error scenarios)
- test_circuit_breaker_manager.py: Won't catch (no telemetry integration)
- Result: Telemetry spam during outages ❌
3. Bug: 500 errors trigger circuit breaker (should only be 429/503)
- test_concurrent_telemetry.py: Won't catch (no error scenarios)
- test_circuit_breaker_manager.py: Won't catch (generic error testing)
- Result: Circuit breaker too aggressive ❌
4. Bug: Circuit breaker config not passed through connection → telemetry
- test_concurrent_telemetry.py: Won't catch (circuit breaker not tested)
- test_circuit_breaker_manager.py: Won't catch (no connection flow)
- Result: Circuit breaker never enabled ❌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 and 4 is good. Although (mocks, no real queries) and no telemetry integration is ambiguous as the current e2e has mocks too. But 2, 3 still covered in tests (test_request_circuit_breaker_open , test_telemetry_request_error_handling)
Signed-off-by: Nikhil Suri <[email protected]>
What type of PR is this?
Description
This PR introduces a circuit breaker pattern to the telemetry system to prevent cascading failures and improve system resilience. The implementation uses the pybreaker library to monitor telemetry request failures and automatically open the circuit when failure rates exceed configurable thresholds, blocking further requests to protect downstream services. When the circuit is open, telemetry requests are temporarily blocked until the system recovers, at which point the circuit transitions to half-open for testing and eventually closes when normal operation resumes. The circuit breaker configuration is centralized with immutable settings, includes comprehensive logging for monitoring, and maintains full backward compatibility with existing telemetry functionality.
How is this tested?
📦 Package Versions Verified
databricks-sql-connector: 4.2.1 (your local version)
pybreaker: 1.4.1 ✨ (successfully installed)
dbt-databricks: 1.11.0 ✅
databricks-sqlalchemy: 2.0.8 ✅
databricks-sdk: 0.67.0 ✅
sqlalchemy: 2.0.44 ✅
LLM GENERATED
Related Tickets & Documents
https://docs.google.com/document/d/1ftRvby9bwDZzE3s1tOb4hJ4Pd9USiXskb9cDw-uQNPM/edit?usp=sharing