-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds resource field reference syntax to template strings #2210
base: devel
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
40d5201
to
5c1e6b2
Compare
5c1e6b2
to
6ac4fdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some comments to the code. Tests look good though. What is still missing:
- examples (github/demo pipeline - incremental) not converted to a new notation
- I do not see tests that use invalid placeholders
- {unknown.attr}
- {incremental.unknown}
- a test where we define several parent resources ie. {resources.issue.id} and {resources.posts.id} in the same resource. I think that there's universal check for those but it makes sense to check it
@@ -316,6 +319,25 @@ def paginate_resource( | |||
incremental_cursor_transform, | |||
) | |||
|
|||
# Interpolate incremental object value into path and params | |||
_incremental: Union[Incremental[Any], InterceptingProxy] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need intercepting proxy? we are not doing late binding here. we are formatting values of path and params right away when calling paginate
. so you could do same thing you do below:
if incremental_param.start:
params[incremental_param.start] = transform(incremental_object.last_value)
also please use start_value
, not last_value
. in this very case there's no difference, but last_value
is updated after every page yielded. here we need a value with which Incremental was started.
@@ -490,7 +514,8 @@ def identity_func(x: Any) -> Any: | |||
|
|||
if transform is None: | |||
transform = identity_func | |||
params[incremental_param.start] = transform(incremental_object.last_value) | |||
if incremental_param.start: | |||
params[incremental_param.start] = transform(incremental_object.last_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use start_value
@@ -403,9 +461,39 @@ def _replace_expression(template: str, params: Dict[str, Any]) -> str: | |||
return template | |||
|
|||
|
|||
def _encode_template_placeholders( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those functions are not used. could you remove them?
if isinstance(obj, str): | ||
return obj.format(**placeholders) | ||
|
||
if isinstance(obj, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could replace recursive code with utils.map_nested_in_place
and just provide lambda obj: obj.format() as mapping function
@@ -286,7 +345,6 @@ def build_resource_dependency_graph( | |||
dependency_graph: graphlib.TopologicalSorter = graphlib.TopologicalSorter() # type: ignore[type-arg] | |||
resolved_param_map: Dict[str, Optional[List[ResolvedParam]]] = {} | |||
endpoint_resource_map = expand_and_index_resources(resource_list, resource_defaults) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this applies to line: 356: why
# extract expressions from parameters that are strings
params_expressions = []
for param_value in endpoint_resource["endpoint"].get("params", {}).values():
# If param_value is a plain string (e.g. "{resources.berry.a_property}")
if isinstance(param_value, str):
extracted = _extract_expressions(param_value, "resources.")
params_expressions.extend(extracted)
resolved_params += _expressions_to_resolved_params(params_expressions)
is not just moved into _find_resolved_params
? this is complicated logic broken into two piecec. also we could unit test it easily
|
||
# If resolved param was defined as `resources.<resource>.<field>` | ||
# we update the resources context | ||
if param_name.startswith("resources."): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you parse the name again? You have full info in ResolvedParam
type. ie resolve_config={'type': 'resolve', 'resource': 'issues', 'field': 'number'}
|
||
params_values = {} | ||
for resolved_param in resolved_params: | ||
field_values = jsonpath.find_values(resolved_param.field_path, item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we support {resources.items.<json_path>}? ie resource.items.post.id
? it should be trivial to do so but _expressions_to_resolved_params
for some reason actively prevents doing that. so we do not have parity with previous implementation.
parts = expression.strip().split(".")
split let's you pass how many parts you want to have so part 3 may have as many dots as you want. there will be some problems with string formatter interpreting dots ie "post.id" but we can write custom one that will not do that.
parent_resource_name = resolved_params[0].resolve_config["resource"] | ||
resources_context = ResourcesContext() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a reason for this class and interpolating proxy. You are not doing any late binding. We just generate a key: value mapping from values that we already know. Please take a look at SimpleNamespace
in types
. it is IMO meant exactly to do that. for example.
params_values["incremental"] = SimpleNamespace(start_value=incremental_value_convert(incremental.start_value))
you can do exactly the same for "resources". there's just one attribute there (resource name) and then set several attributes to field
=value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another alternative would be to implement our own Formatter
class and return instance. It can hold incremental, current data item and resolved params and provide values to string. that could IMO eliminate a lot of the code
Resolves #2190
Implements #1880 by introducing
incremental
context object.