Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds resource field reference syntax to template strings #2210

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

burnash
Copy link
Collaborator

@burnash burnash commented Jan 14, 2025

Resolves #2190
Implements #1880 by introducing incremental context object.

Copy link

netlify bot commented Jan 14, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 6ac4fdc
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/67a02c69ef43a00008259603

@burnash burnash force-pushed the feat/2190-simplify-dependent-resource-declaration branch from 40d5201 to 5c1e6b2 Compare January 14, 2025 16:45
@burnash burnash force-pushed the feat/2190-simplify-dependent-resource-declaration branch from 5c1e6b2 to 6ac4fdc Compare February 3, 2025 02:39
@burnash burnash requested a review from rudolfix February 3, 2025 09:01
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments to the code. Tests look good though. What is still missing:

  1. examples (github/demo pipeline - incremental) not converted to a new notation
  2. I do not see tests that use invalid placeholders
  • {unknown.attr}
  • {incremental.unknown}
  1. a test where we define several parent resources ie. {resources.issue.id} and {resources.posts.id} in the same resource. I think that there's universal check for those but it makes sense to check it

@@ -316,6 +319,25 @@ def paginate_resource(
incremental_cursor_transform,
)

# Interpolate incremental object value into path and params
_incremental: Union[Incremental[Any], InterceptingProxy]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need intercepting proxy? we are not doing late binding here. we are formatting values of path and params right away when calling paginate. so you could do same thing you do below:

if incremental_param.start:
        params[incremental_param.start] = transform(incremental_object.last_value)

also please use start_value, not last_value. in this very case there's no difference, but last_value is updated after every page yielded. here we need a value with which Incremental was started.

@@ -490,7 +514,8 @@ def identity_func(x: Any) -> Any:

if transform is None:
transform = identity_func
params[incremental_param.start] = transform(incremental_object.last_value)
if incremental_param.start:
params[incremental_param.start] = transform(incremental_object.last_value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use start_value

@@ -403,9 +461,39 @@ def _replace_expression(template: str, params: Dict[str, Any]) -> str:
return template


def _encode_template_placeholders(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those functions are not used. could you remove them?

if isinstance(obj, str):
return obj.format(**placeholders)

if isinstance(obj, dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could replace recursive code with utils.map_nested_in_place and just provide lambda obj: obj.format() as mapping function

@@ -286,7 +345,6 @@ def build_resource_dependency_graph(
dependency_graph: graphlib.TopologicalSorter = graphlib.TopologicalSorter() # type: ignore[type-arg]
resolved_param_map: Dict[str, Optional[List[ResolvedParam]]] = {}
endpoint_resource_map = expand_and_index_resources(resource_list, resource_defaults)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this applies to line: 356: why

# extract expressions from parameters that are strings
        params_expressions = []
        for param_value in endpoint_resource["endpoint"].get("params", {}).values():
            # If param_value is a plain string (e.g. "{resources.berry.a_property}")
            if isinstance(param_value, str):
                extracted = _extract_expressions(param_value, "resources.")
                params_expressions.extend(extracted)

        resolved_params += _expressions_to_resolved_params(params_expressions)

is not just moved into _find_resolved_params? this is complicated logic broken into two piecec. also we could unit test it easily


# If resolved param was defined as `resources.<resource>.<field>`
# we update the resources context
if param_name.startswith("resources."):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you parse the name again? You have full info in ResolvedParam type. ie resolve_config={'type': 'resolve', 'resource': 'issues', 'field': 'number'}


params_values = {}
for resolved_param in resolved_params:
field_values = jsonpath.find_values(resolved_param.field_path, item)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we support {resources.items.<json_path>}? ie resource.items.post.id? it should be trivial to do so but _expressions_to_resolved_params for some reason actively prevents doing that. so we do not have parity with previous implementation.

parts = expression.strip().split(".")

split let's you pass how many parts you want to have so part 3 may have as many dots as you want. there will be some problems with string formatter interpreting dots ie "post.id" but we can write custom one that will not do that.

parent_resource_name = resolved_params[0].resolve_config["resource"]
resources_context = ResourcesContext()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a reason for this class and interpolating proxy. You are not doing any late binding. We just generate a key: value mapping from values that we already know. Please take a look at SimpleNamespace in types. it is IMO meant exactly to do that. for example.

params_values["incremental"] = SimpleNamespace(start_value=incremental_value_convert(incremental.start_value))

you can do exactly the same for "resources". there's just one attribute there (resource name) and then set several attributes to field=value

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another alternative would be to implement our own Formatter class and return instance. It can hold incremental, current data item and resolved params and provide values to string. that could IMO eliminate a lot of the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants