Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Taskspec fuse implementation #1162

Merged
merged 4 commits into from
Nov 13, 2024
Merged

Use Taskspec fuse implementation #1162

merged 4 commits into from
Nov 13, 2024

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Nov 11, 2024

This is using the Task.fuse (see dask/dask#11509) method for the Fused task. In theory this should've reduced complexity quite a bit but I encountered problems since the subgraphs are including dead nodes, i.e. there are expressions in _expr that are no longer reachable if we walk the tasks. The Task.fuse raises in such a case since it requires a subgraph that is uniquely reducing to a single task (I put that condition in to catch stuff like this).

Adding an explicit culling step in advance fixes this so everything is OK. However, where is this redundant node coming from and can we get rid of this??

cc @phofl

Comment on lines 3778 to 3790
internal_tasks[t.key] = t
# The above is ambiguous and we have to cull to get an unambiguous graph
outkey = (self.exprs[0]._name, index)
work = [outkey]
internal_tasks_culled = []
while work:
tkey = work.pop()
if tkey not in internal_tasks:
# External dependency
continue
t = internal_tasks[tkey]
internal_tasks_culled.append(t)
work.extend(t.dependencies)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this shouldn't be necessary if there weren't dead nodes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example with dead nodes? I am curious to see what's going on there

@fjetter
Copy link
Member Author

fjetter commented Nov 11, 2024

Adding an explicit culling step in advance fixes this so everything is OK. However, where is this redundant node coming from and can we get rid of this??

Note: The same is happening in dask/dask when trying to replace the SubgraphCallables :(

@phofl
Copy link
Collaborator

phofl commented Nov 13, 2024

I'ver removed the culling, things should be good to go now

@phofl phofl marked this pull request as ready for review November 13, 2024 16:03
@phofl phofl changed the title Taskspec fuse Use Taskspec fuse implementation Nov 13, 2024
@phofl phofl merged commit bb0aef0 into dask:main Nov 13, 2024
7 checks passed
@phofl
Copy link
Collaborator

phofl commented Nov 13, 2024

thanks

@fjetter fjetter deleted the taskspec_fuse branch November 13, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants