Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fusion calling things multiple times #1161

Merged
merged 3 commits into from
Nov 11, 2024
Merged

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Nov 11, 2024

@fjetter fjetter marked this pull request as draft November 11, 2024 09:58
Comment on lines 3790 to 3800
t = Task(
name,
Fused._execute_internal_graph,
# Wrap the actual subgraph as a data node such that the tasks are
# not erroneously parsed. The external task would otherwise carry
# the internal keys as dependencies which is satisfiable
DataNode(None, internal_tasks),
dependencies,
(self.exprs[0]._name, index),
)
return t
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this (and similar patterns) now two or three times. I believe I'll factor this out to a classmethod like Task.fuse_tasks. If we end up rewriting the low level fusion this will obviously also come in handy. For now, we can keep this logic to get out a fix quickly

@fjetter fjetter marked this pull request as ready for review November 11, 2024 11:22
Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @fjetter. This looks generally good to me, two non-blocking questions about comments.

dask_expr/_expr.py Outdated Show resolved Hide resolved
dask_expr/_expr.py Outdated Show resolved Hide resolved
@fjetter fjetter merged commit ea970f1 into dask:main Nov 11, 2024
7 checks passed
@fjetter fjetter deleted the fix_times_called branch November 11, 2024 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

perf: big slowdown in tpch queries between 2024.10.0 and 2024.11.0
2 participants