Skip to content

Conversation

@maxi297
Copy link
Contributor

@maxi297 maxi297 commented Nov 5, 2025

What

We thought it would address the CDK part of https://github.com/airbytehq/oncall/issues/9874 but in the end it was just a misconfiguration of the PEP stream having 403 needed to be consider as RATE_LIMITED.

Today, API budgets are only applied to the child streams. It is important to avoid rate limit to have this also shared on the parent stream.

How

Add api_budget in the constructor of ModelToComponentFactory and pass it when creating the factory for the parent stream.

Summary by CodeRabbit

  • New Features

    • API budget now propagates automatically through substreams, ensuring consistent rate limiting across nested stream hierarchies.
    • API budget can be specified during factory setup so nested components inherit the same budget policy.
  • Tests

    • Test suite expanded to validate API budget propagation and confirm budget policies are applied across parent and substream retrievers.

@maxi297 maxi297 requested a review from tolik0 November 5, 2025 19:13
@github-actions github-actions bot added the enhancement New feature or request label Nov 5, 2025
@github-actions
Copy link

github-actions bot commented Nov 5, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/have_parent_streams_use_api_budget#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/have_parent_streams_use_api_budget

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 5, 2025

📝 Walkthrough

Walkthrough

The ModelToComponentFactory now accepts and stores an optional api_budget: Optional[APIBudget], narrows the internal _api_budget type to APIBudget, and propagates it into nested substream factories; tests add a set_api_budget(...) usage and assert propagation to HTTP retrievers.

Changes

Cohort / File(s) Summary
Factory constructor & field
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Adds api_budget: Optional[APIBudget] parameter to ModelToComponentFactory.__init__, changes internal _api_budget typing to Optional[APIBudget], initializes it from the constructor arg.
Substream factory propagation
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Passes api_budget=self._api_budget when constructing nested ModelToComponentFactory instances so child factories receive the same API budget.
Public API: test helper
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Adds set_api_budget(...) method used by tests to configure an API budget on an existing factory prior to component creation.
Unit tests
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
Tests updated to provision an API budget (via constructor or set_api_budget), create a SubstreamPartitionRouter, and assert the API budget/policies are attached to parent HTTP retrievers and propagated through setup.

Sequence Diagram

sequenceDiagram
    participant Test as Test / Caller
    participant Factory as ModelToComponentFactory
    participant SubFactory as Nested ModelToComponentFactory
    participant Retriever as HTTP Retriever

    Test->>Factory: __init__(..., api_budget=budget) / set_api_budget(...)
    activate Factory
    Factory->>Factory: store self._api_budget = api_budget
    deactivate Factory

    Test->>Factory: build components / create_substreams()
    activate Factory
    Factory->>SubFactory: __init__(..., api_budget=self._api_budget)
    activate SubFactory
    SubFactory->>SubFactory: store self._api_budget = api_budget
    deactivate SubFactory
    Factory->>Retriever: instantiate retriever with budget policies from self._api_budget
    deactivate Factory
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Check the narrowed _api_budget type change (removed HttpAPIBudget) for compatibility across all usage sites.
  • Verify every path that constructs nested factories now consistently passes api_budget.
  • Review the new set_api_budget(...) method for lifecycle/ordering implications when used after construction.
  • Do we want an additional test for deeper nested substream levels to ensure multi-level propagation, wdyt?

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: adding API budget sharing with parent streams in the ModelToComponentFactory.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/have_parent_streams_use_api_budget

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

684-684: Nice addition of the api_budget parameter!

The implementation looks good. Would it be helpful to add a brief docstring explaining when and why this parameter should be set? For instance, documenting that it's used to enforce rate limits across parent and child streams. Wdyt?

unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)

11-11: Verify whether the Mock import is needed.

I notice Mock is imported from unittest.mock but don't see it used in the changed test code. Was this import added for future test cases, or could it be removed? wdyt?

Run this script to check if Mock is used elsewhere in the test file:

#!/bin/bash
# Check for Mock usage in the test file
rg -n '\bMock\b' unit_tests/sources/declarative/parsers/test_model_to_component_factory.py | grep -v "^11:"

780-786: Consider whether accessing private attributes in assertions is the best approach.

The assertions verify API budget propagation by accessing ._policies, which is a private attribute (indicated by the underscore prefix). While this is common in tests, it couples the test to internal implementation details. Would it be better to verify the behavior through public APIs or check for the existence of _api_budget itself rather than drilling into _policies? For example:

-    # ensure api budget
-    assert get_retriever(
-        parent_stream_configs[0].stream
-    ).requester._http_client._api_budget._policies
-    assert get_retriever(
-        parent_stream_configs[1].stream
-    ).requester._http_client._api_budget._policies
+    # ensure api budget is set on parent streams
+    parent_0_budget = get_retriever(parent_stream_configs[0].stream).requester._http_client._api_budget
+    assert parent_0_budget is not None
+    assert isinstance(parent_0_budget, APIBudget)
+    
+    parent_1_budget = get_retriever(parent_stream_configs[1].stream).requester._http_client._api_budget
+    assert parent_1_budget is not None
+    assert isinstance(parent_1_budget, APIBudget)

This would be slightly more resilient to internal refactoring while still verifying the PR objective. wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6504148 and b6cd208.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Applied to files:

  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Applied to files:

  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
📚 Learning: 2025-01-14T00:20:32.310Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.

Applied to files:

  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧬 Code graph analysis (2)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (3)
unit_tests/connector_builder/test_connector_builder_handler.py (2)
  • streams (834-835)
  • get_retriever (442-443)
airbyte_cdk/sources/streams/call_rate.py (1)
  • APIBudget (513-627)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
  • set_api_budget (4256-4259)
  • create_component (827-860)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
airbyte_cdk/sources/streams/call_rate.py (1)
  • APIBudget (513-627)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

699-699: Clean type simplification!

The type change looks good—removing the single-element Union makes the type hint cleaner. The assignment correctly initializes the budget from the constructor parameter.


3891-3891: Perfect propagation of the API budget!

This correctly passes the api_budget to the nested factory when creating parent streams, which achieves the PR objective of sharing API budgets between parent and child streams to avoid rate limits. The implementation ensures that parent streams inherit the same rate limiting configuration.

unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

748-772: Nice job creating a new factory instance for test isolation!

Creating a dedicated ModelToComponentFactory instance and configuring it with set_api_budget before creating the partition router ensures this test doesn't affect others. The budget configuration looks comprehensive with the MovingWindowCallRatePolicy setup.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

PyTest Results (Fast)

282 tests   - 3 535   272 ✅  - 3 533   1m 34s ⏱️ - 5m 9s
  1 suites ±    0     9 💤  -     3 
  1 files   ±    0     1 ❌ +    1 

For more details on these failures, see this check.

Results for commit d0ca8c6. ± Comparison against base commit 6504148.

This pull request removes 3535 tests.
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_latest_record_cursor_value_is_higher_than_slice_end-2021-01-01-stream_slice2-observed_records2-expected_state2]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_previous_cursor_is_highest-2023-01-01-stream_slice0-observed_records0-expected_state0]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_stream_slice_partition_end_is_highest-2020-01-01-stream_slice1-observed_records1-expected_state1]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_and_no_previous_cursor-None-stream_slice9-observed_records9-expected_state9]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_boundaries-2021-01-01-stream_slice8-observed_records8-expected_state8]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed-2021-01-01-stream_slice3-observed_records3-expected_state3]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed_and_no_previous_state-None-stream_slice4-observed_records4-expected_state4]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_out_of_order_records-2021-01-01-stream_slice6-observed_records6-expected_state6]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_some_records_out_of_slice_boundaries-2021-01-01-stream_slice7-observed_records7-expected_state7]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_without_previous_cursor-None-stream_slice5-observed_records5-expected_state5]
…

♻️ This comment has been updated with latest results.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6cd208 and 8b1541a.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-01-13T23:39:15.457Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.

Applied to files:

  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
🧬 Code graph analysis (1)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
  • set_api_budget (4256-4259)
  • create_component (827-860)
unit_tests/connector_builder/test_connector_builder_handler.py (1)
  • get_retriever (442-443)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Analyze (python)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (1)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

747-765: API budget setup looks good! Quick question about the empty matchers list.

The API budget configuration is set up correctly. I notice the matchers list is empty (line 760), which means this policy will apply to all HTTP requests made by the parent streams. Is this intentional for this test, or would it be more realistic to include a matcher? Looking at other tests like test_api_budget() (line 4220-4226), they specify matchers to target specific endpoints. Wdyt?

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

PyTest Results (Full)

3 820 tests   3 808 ✅  11m 22s ⏱️
    1 suites     12 💤
    1 files        0 ❌

Results for commit d0ca8c6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants