-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Taskspec fuse implementation #1162
Conversation
dask_expr/_expr.py
Outdated
internal_tasks[t.key] = t | ||
# The above is ambiguous and we have to cull to get an unambiguous graph | ||
outkey = (self.exprs[0]._name, index) | ||
work = [outkey] | ||
internal_tasks_culled = [] | ||
while work: | ||
tkey = work.pop() | ||
if tkey not in internal_tasks: | ||
# External dependency | ||
continue | ||
t = internal_tasks[tkey] | ||
internal_tasks_culled.append(t) | ||
work.extend(t.dependencies) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this shouldn't be necessary if there weren't dead nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example with dead nodes? I am curious to see what's going on there
Note: The same is happening in dask/dask when trying to replace the SubgraphCallables :( |
I'ver removed the culling, things should be good to go now |
thanks |
This is using the
Task.fuse
(see dask/dask#11509) method for the Fused task. In theory this should've reduced complexity quite a bit but I encountered problems since the subgraphs are including dead nodes, i.e. there are expressions in_expr
that are no longer reachable if we walk the tasks. TheTask.fuse
raises in such a case since it requires a subgraph that is uniquely reducing to a single task (I put that condition in to catch stuff like this).Adding an explicit culling step in advance fixes this so everything is OK. However, where is this redundant node coming from and can we get rid of this??
cc @phofl