New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Adds resource field reference syntax to template strings #2210

Open

burnash wants to merge 1 commit into devel from feat/2190-simplify-dependent-resource-declaration

Collaborator

burnash commented Jan 14, 2025 •

edited

Loading

Resolves #2190
Implements #1880 by introducing incremental context object.

netlify bot commented Jan 14, 2025 •

edited

Loading

✅ Deploy Preview for dlt-hub-docs canceled.

Name	Link
🔨 Latest commit	`6ac4fdc`
🔍 Latest deploy log	https://app.netlify.com/sites/dlt-hub-docs/deploys/67a02c69ef43a00008259603

burnash force-pushed the feat/2190-simplify-dependent-resource-declaration branch from 40d5201 to 5c1e6b2 Compare

January 14, 2025 16:45

burnash linked an issue

that may be closed by this pull request

rest_api: Extend generic API source to allow for incremental path parameters #1880

Open

sh-rp assigned burnash


          Adds resource field reference syntax to template strings

6ac4fdc

burnash force-pushed the feat/2190-simplify-dependent-resource-declaration branch from 5c1e6b2 to 6ac4fdc Compare

February 3, 2025 02:39

burnash requested a review from rudolfix

February 3, 2025 09:01

rudolfix requested changes

View reviewed changes

Collaborator

rudolfix left a comment

I have some comments to the code. Tests look good though. What is still missing:

examples (github/demo pipeline - incremental) not converted to a new notation
I do not see tests that use invalid placeholders

{unknown.attr}
{incremental.unknown}

a test where we define several parent resources ie. {resources.issue.id} and {resources.posts.id} in the same resource. I think that there's universal check for those but it makes sense to check it

dlt/sources/rest_api/__init__.py

@@ @@ -316,6 +319,25 @@ def paginate_resource( @@
                                       incremental_cursor_transform,
                                   )
+                                  # Interpolate incremental object value into path and params
+                                  _incremental: Union[Incremental[Any], InterceptingProxy]

Collaborator

rudolfix Feb 3, 2025

why do you need intercepting proxy? we are not doing late binding here. we are formatting values of path and params right away when calling paginate. so you could do same thing you do below:

if incremental_param.start:
        params[incremental_param.start] = transform(incremental_object.last_value)

also please use start_value, not last_value. in this very case there's no difference, but last_value is updated after every page yielded. here we need a value with which Incremental was started.

dlt/sources/rest_api/__init__.py

@@ @@ -490,7 +514,8 @@ def identity_func(x: Any) -> Any: @@
                   if transform is None:
                       transform = identity_func
-                  params[incremental_param.start] = transform(incremental_object.last_value)
+                  if incremental_param.start:
+                      params[incremental_param.start] = transform(incremental_object.last_value)

Collaborator

rudolfix Feb 3, 2025

pls use start_value

dlt/sources/rest_api/config_setup.py

		@@ -403,9 +461,39 @@ def _replace_expression(template: str, params: Dict[str, Any]) -> str:
		return template


		def _encode_template_placeholders(

Collaborator

rudolfix Feb 3, 2025

those functions are not used. could you remove them?

dlt/sources/rest_api/config_setup.py

+                  if isinstance(obj, str):
+                      return obj.format(**placeholders)
+                  if isinstance(obj, dict):

Collaborator

rudolfix Feb 3, 2025

you could replace recursive code with utils.map_nested_in_place and just provide lambda obj: obj.format() as mapping function

dlt/sources/rest_api/config_setup.py

@@ @@ -286,7 +345,6 @@ def build_resource_dependency_graph( @@
                   dependency_graph: graphlib.TopologicalSorter = graphlib.TopologicalSorter()  # type: ignore[type-arg]
                   resolved_param_map: Dict[str, Optional[List[ResolvedParam]]] = {}
                   endpoint_resource_map = expand_and_index_resources(resource_list, resource_defaults)

Collaborator

rudolfix Feb 3, 2025

this applies to line: 356: why

# extract expressions from parameters that are strings
        params_expressions = []
        for param_value in endpoint_resource["endpoint"].get("params", {}).values():
            # If param_value is a plain string (e.g. "{resources.berry.a_property}")
            if isinstance(param_value, str):
                extracted = _extract_expressions(param_value, "resources.")
                params_expressions.extend(extracted)

        resolved_params += _expressions_to_resolved_params(params_expressions)

is not just moved into _find_resolved_params? this is complicated logic broken into two piecec. also we could unit test it easily

dlt/sources/rest_api/config_setup.py

+                      # If resolved param was defined as `resources.<resource>.<field>`
+                      # we update the resources context
+                      if param_name.startswith("resources."):

Collaborator

rudolfix Feb 3, 2025

why do you parse the name again? You have full info in ResolvedParam type. ie resolve_config={'type': 'resolve', 'resource': 'issues', 'field': 'number'}

dlt/sources/rest_api/config_setup.py

-                  params_values = {}
                   for resolved_param in resolved_params:
                       field_values = jsonpath.find_values(resolved_param.field_path, item)

Collaborator

rudolfix Feb 3, 2025

do we support {resources.items.<json_path>}? ie resource.items.post.id? it should be trivial to do so but _expressions_to_resolved_params for some reason actively prevents doing that. so we do not have parity with previous implementation.

parts = expression.strip().split(".")

split let's you pass how many parts you want to have so part 3 may have as many dots as you want. there will be some problems with string formatter interpreting dots ie "post.id" but we can write custom one that will not do that.

dlt/sources/rest_api/config_setup.py

		parent_resource_name = resolved_params[0].resolve_config["resource"]
		resources_context = ResourcesContext()

Collaborator

rudolfix Feb 3, 2025

I do not see a reason for this class and interpolating proxy. You are not doing any late binding. We just generate a key: value mapping from values that we already know. Please take a look at SimpleNamespace in types. it is IMO meant exactly to do that. for example.

params_values["incremental"] = SimpleNamespace(start_value=incremental_value_convert(incremental.start_value))

you can do exactly the same for "resources". there's just one attribute there (resource name) and then set several attributes to field=value

Collaborator

rudolfix Feb 3, 2025

another alternative would be to implement our own Formatter class and return instance. It can hold incremental, current data item and resolved params and provide values to string. that could IMO eliminate a lot of the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet