Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DMN][WIP] Experimental multi-GPU Polars testing #18335

Draft
wants to merge 226 commits into
base: branch-25.06
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
226 commits
Select commit Hold shift + click to select a range
f0964a6
basic groupby-aggregation support
rjzamora Dec 4, 2024
1329cf1
Merge branch 'branch-25.02' into cudf-polars-multi-groupby
rjzamora Dec 4, 2024
11a03f8
Merge branch 'branch-25.02' into cudf-polars-multi-groupby
rjzamora Dec 4, 2024
154c138
add multi-partition join support
rjzamora Dec 4, 2024
b4cd727
Merge remote-tracking branch 'upstream/branch-25.02' into cudf-polars…
rjzamora Dec 4, 2024
03bed94
add broadcast join support
rjzamora Dec 4, 2024
2ce5e0d
Merge branch 'branch-25.02' into cudf-polars-multi-join
rjzamora Dec 4, 2024
a9fa486
Merge remote-tracking branch 'upstream/branch-25.02' into cudf-polars…
rjzamora Dec 4, 2024
b1224a0
remove GroupbyTree
rjzamora Dec 4, 2024
385f03a
simplify lower
rjzamora Dec 6, 2024
8956215
Merge remote-tracking branch 'upstream/branch-25.02' into cudf-polars…
rjzamora Dec 6, 2024
ae1019f
Merge branch 'branch-25.02' into cudf-polars-multi-join
rjzamora Dec 6, 2024
0456c2f
Merge branch 'branch-25.02' into cudf-polars-multi-join
rjzamora Dec 6, 2024
70b29b2
Merge remote-tracking branch 'upstream/branch-25.02' into cudf-polars…
rjzamora Dec 19, 2024
3f04eca
cleanup
rjzamora Dec 19, 2024
e090de5
no cover
rjzamora Dec 19, 2024
24b88f2
tweak error message
rjzamora Dec 19, 2024
161a53b
Merge branch 'branch-25.02' into cudf-polars-multi-groupby
rjzamora Jan 9, 2025
e876e14
Merge remote-tracking branch 'upstream/branch-25.02' into cudf-polars…
rjzamora Jan 9, 2025
b836334
update copyright dates
rjzamora Jan 9, 2025
69f6336
update copyright dates
rjzamora Jan 9, 2025
22cebeb
add test coverage for single-partition
rjzamora Jan 11, 2025
45ac8ec
Merge branch 'branch-25.02' into cudf-polars-multi-groupby
rjzamora Jan 11, 2025
0786d3d
Merge remote-tracking branch 'upstream/branch-25.02' into cudf-polars…
rjzamora Jan 24, 2025
357e65e
align join with shuffle
rjzamora Jan 24, 2025
0448b1d
Merge remote-tracking branch 'origin/cudf-polars-multi-join' into cud…
rjzamora Jan 24, 2025
bbf6988
Merge remote-tracking branch 'origin/cudf-polars-multi-groupby' into …
rjzamora Jan 24, 2025
f5205bd
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Jan 27, 2025
f795fa9
add temporary serialization changes
rjzamora Jan 27, 2025
f32b046
support sum aggregation
rjzamora Jan 28, 2025
a3e4c50
centralize serialization workaround
rjzamora Jan 29, 2025
a7cd29f
Merge branch 'branch-25.04' into cudf-polars-multi-groupby
rjzamora Jan 29, 2025
5b906c4
use managed memory with dask-cuda for now
rjzamora Jan 29, 2025
ce86f26
fix join
rjzamora Jan 29, 2025
523f0ef
add basic aggregation support
rjzamora Feb 6, 2025
4b6f180
roll back change to literal.py
rjzamora Feb 6, 2025
920c361
make get_expr_partition_count more efficient
rjzamora Feb 6, 2025
71bbe53
make get_expr_partition_count more efficient
rjzamora Feb 6, 2025
a6b05a9
fix copyright changes
rjzamora Feb 6, 2025
3092c58
use traversal
rjzamora Feb 6, 2025
c4ed2a6
roll back unnecessary date change
rjzamora Feb 6, 2025
5af267b
move fuse_expr_graph
rjzamora Feb 6, 2025
c13e916
cleanup
rjzamora Feb 6, 2025
663db89
add mean support
rjzamora Feb 6, 2025
062c322
update some comments
rjzamora Feb 6, 2025
5f7d73c
add todo comment
rjzamora Feb 6, 2025
fd3cf5a
update join.py
rjzamora Feb 11, 2025
1c7ea8a
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Feb 11, 2025
8cac6ab
align with expression changes
rjzamora Feb 11, 2025
3a4e83b
align with proposed agg updates
rjzamora Feb 12, 2025
1e2fa82
stop using 'serialized-type'
rjzamora Feb 12, 2025
64dfc4c
add Column serialization support
rjzamora Feb 12, 2025
4b7de09
remove debug statement
rjzamora Feb 12, 2025
37ebf45
Merge remote-tracking branch 'upstream/branch-25.04' into serialize-c…
rjzamora Feb 12, 2025
fbf0528
address code review
rjzamora Feb 12, 2025
30793d1
use TypedDict for serialization headers
rjzamora Feb 12, 2025
a9bd141
Merge branch 'serialize-columns' into cudf-polars-multi-combined-agg-…
rjzamora Feb 12, 2025
1dd2977
add escape hatch for non_child_args serialization
rjzamora Feb 12, 2025
e888aa9
fix csv test
rjzamora Feb 12, 2025
eb6a1c7
use Peter's GroupbyOptions class
rjzamora Feb 13, 2025
8d727d4
remove eval_signature dispatch in favor of simpler PredicateWrapper c…
rjzamora Feb 13, 2025
8c3df10
remove unused dispatch
rjzamora Feb 13, 2025
e6efc10
add serialization workarounds
rjzamora Feb 13, 2025
8b86f4e
add basic test coverage
rjzamora Feb 13, 2025
6bfaafd
address code review
rjzamora Feb 13, 2025
296979b
Merge remote-tracking branch 'upstream/branch-25.04' into serializabl…
rjzamora Feb 13, 2025
71c853d
move pl-to-pa conversion from LiteralColumn init to translation
rjzamora Feb 13, 2025
93bc9ae
Merge branch 'serializable-nodes' into cudf-polars-multi-combined-agg…
rjzamora Feb 13, 2025
bc1292c
adjust large-graph-warning-threshold config
rjzamora Feb 13, 2025
958bad5
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Feb 13, 2025
d8dd1a1
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Feb 13, 2025
7450abd
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Feb 21, 2025
3543e84
use upstream ir
rjzamora Feb 21, 2025
4a980e7
add rapidsmp shuffle - not working yet
rjzamora Feb 22, 2025
e52a9fc
try using rapidsmp shuffle when available
rjzamora Feb 24, 2025
b6bee93
update dask integration
rjzamora Feb 24, 2025
af32360
update dask integration
rjzamora Feb 24, 2025
e0faee3
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Feb 25, 2025
ef79e90
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Feb 25, 2025
b8a20e6
formatting
rjzamora Feb 25, 2025
a34a275
ruff
rjzamora Feb 25, 2025
f4a2db4
add unary op support within simple groupby aggregation
rjzamora Feb 26, 2025
79f5a81
tweak broadcast-merge criteria
rjzamora Feb 26, 2025
1645d59
fix serialize/deserialize
rjzamora Feb 26, 2025
177defd
add shuffle_method and bcast_join_limit config options
rjzamora Feb 26, 2025
3071792
add shuffle_method and bcast_join_limit config options
rjzamora Feb 26, 2025
fde4231
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Feb 27, 2025
7d18e7b
add shuffle-based groupby
rjzamora Feb 27, 2025
f21e1cd
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Feb 28, 2025
ee47bd9
Add `pylibcudf.gpumemoryview` support for `len()`/`nbytes`
pentschev Feb 28, 2025
ccd1029
improve test coverage
rjzamora Feb 28, 2025
33bf65c
Add gpumemoryview tests for `len()`/`nbytes`
pentschev Feb 28, 2025
ac73bab
Add `gpumemoryview.__cuda_array_interface__` tests
pentschev Feb 28, 2025
41844b7
Update stubs
pentschev Feb 28, 2025
df902aa
Merge branch 'branch-25.04' into pylibcudf-gpumemoryview-len
pentschev Feb 28, 2025
6dfb397
add ConfigOptions class
rjzamora Feb 28, 2025
3f00203
Fix typo in `__cuda_array_interface__` name
pentschev Feb 28, 2025
6a13323
Merge remote-tracking branch 'origin/pylibcudf-gpumemoryview-len' int…
pentschev Feb 28, 2025
6e088f0
Merge remote-tracking branch 'pentschev/pylibcudf-gpumemoryview-len' …
rjzamora Feb 28, 2025
9f3ff3d
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Feb 28, 2025
519a7ca
Merge remote-tracking branch 'pentschev/pylibcudf-gpumemoryview-len' …
rjzamora Feb 28, 2025
26f784a
Support CUDA deserializing to `pylibcudf.gpumemoryview`
pentschev Feb 28, 2025
5649f73
Add tests serializing RMM headers
pentschev Feb 28, 2025
af82051
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 3, 2025
0a13145
check for periods
rjzamora Mar 3, 2025
a977df6
roll back unnecessary change
rjzamora Mar 3, 2025
7331a90
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 3, 2025
5bbbbde
Merge remote-tracking branch 'pentschev/cudf-polars-serialize-gpumemo…
rjzamora Mar 3, 2025
309757c
Merge branch 'branch-25.04' into cudf-polars-multi-groupby
rjzamora Mar 3, 2025
9b33515
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 4, 2025
ac6ae31
update tpch example to use bootstrap_dask_cluster ahead of time
rjzamora Mar 4, 2025
0bb7211
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 4, 2025
e445e37
remove copy API and make ConfigOptions immutable
rjzamora Mar 4, 2025
320d6fb
Merge branch 'branch-25.04' into cudf-polars-config-options
rjzamora Mar 4, 2025
75a7257
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 4, 2025
e9abe33
use typing_extensions for older python versions
rjzamora Mar 4, 2025
eb7a79a
formatting
rjzamora Mar 4, 2025
385c68a
break out the decomposition of a single groupby request into a stand-…
rjzamora Mar 4, 2025
7c96482
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 4, 2025
fe3ca7c
break out the decomposition of a single groupby request into a stand-…
rjzamora Mar 4, 2025
fa64610
Merge remote-tracking branch 'upstream/branch-25.04' into complex-agg…
rjzamora Mar 5, 2025
9690d6e
use replace
rjzamora Mar 5, 2025
ad362b6
avoid passing through options to renamed aggs unless the new options …
rjzamora Mar 5, 2025
1a96b6e
remove unused func
rjzamora Mar 5, 2025
5514889
address partial code review
rjzamora Mar 5, 2025
9b351d9
remove strict=False everywhere
rjzamora Mar 5, 2025
ddce78f
address review in select.py
rjzamora Mar 5, 2025
82f8d10
use NamedExprs to better keep track of ouput column names
rjzamora Mar 5, 2025
b6a9ef2
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 5, 2025
edc5a0e
Merge branch 'complex-aggregations' into cudf-polars-multi-combined-2…
rjzamora Mar 5, 2025
8ba8cc4
Merge remote-tracking branch 'origin/cudf-polars-multi-groupby' into …
rjzamora Mar 5, 2025
9b6234c
update callback.py
rjzamora Mar 5, 2025
9461510
add KVIKIO_COMPAT_MODE setting to tpch.py
rjzamora Mar 5, 2025
a4596a1
tweak env setting
rjzamora Mar 5, 2025
16cf883
Merge branch 'branch-25.04' into cudf-polars-multi-groupby
rjzamora Mar 7, 2025
6a93417
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 7, 2025
2dc1ca0
roll back obsolete container changes
rjzamora Mar 7, 2025
0a3b70b
remove frames_to_gpumemoryview
rjzamora Mar 7, 2025
e9c413e
Merge branch 'branch-25.04' into complex-aggregations
rjzamora Mar 7, 2025
e04a7ed
Merge branch 'branch-25.04' into cudf-polars-config-options
rjzamora Mar 7, 2025
576bade
add temporary Sort optimization for tpch queries
rjzamora Mar 7, 2025
0d6cbd5
change polars-cpu config
rjzamora Mar 7, 2025
a1cb27b
Merge branch 'branch-25.04' into complex-aggregations
rjzamora Mar 10, 2025
5e7defa
add rapidsmp integration
rjzamora Mar 10, 2025
66183c4
Merge remote-tracking branch 'upstream/branch-25.04' into rapidsmp-sh…
rjzamora Mar 10, 2025
1916f6c
add test
rjzamora Mar 10, 2025
0c4d368
add test coverage
rjzamora Mar 11, 2025
9f9c097
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 11, 2025
f959988
address schema and maintain_order issues
rjzamora Mar 11, 2025
bca71d6
use lawrences suggestions
rjzamora Mar 11, 2025
568a3f0
Merge branch 'branch-25.04' into cudf-polars-multi-groupby
rjzamora Mar 11, 2025
07265e7
Merge remote-tracking branch 'upstream/branch-25.04' into complex-agg…
rjzamora Mar 11, 2025
7b1252b
modify coverage
rjzamora Mar 11, 2025
ef10e25
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 11, 2025
7f90ded
address small code-review comments
rjzamora Mar 11, 2025
23f5b63
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 11, 2025
ffb69c7
align with 17503
rjzamora Mar 11, 2025
cb13d63
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 11, 2025
22582fd
Merge remote-tracking branch 'origin/cudf-polars-config-options' into…
rjzamora Mar 11, 2025
e802263
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 11, 2025
e477e50
handle q18
rjzamora Mar 12, 2025
1e51461
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 12, 2025
3225121
support more tpch queries
rjzamora Mar 12, 2025
1a3348a
fix cardinality_factor config
rjzamora Mar 12, 2025
15e66d8
add query 4
rjzamora Mar 12, 2025
76b2aab
support query 7
rjzamora Mar 12, 2025
1b9e940
add support for query 8 (sort of)
rjzamora Mar 12, 2025
81d2257
add more queries
rjzamora Mar 12, 2025
e9b2d09
add notes on unsupported queries
rjzamora Mar 12, 2025
b143863
more notes
rjzamora Mar 12, 2025
d7e67df
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 13, 2025
ed8667e
fix shuffle default
rjzamora Mar 13, 2025
f6f8fa9
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 13, 2025
793278b
Merge remote-tracking branch 'upstream/branch-25.04' into rapidsmp-sh…
rjzamora Mar 13, 2025
6acded9
fix configs
rjzamora Mar 13, 2025
c37ad58
fix imports
rjzamora Mar 13, 2025
57a8ce3
Merge remote-tracking branch 'upstream/branch-25.04' into rapidsmp-sh…
rjzamora Mar 13, 2025
27507cd
simplify
rjzamora Mar 13, 2025
71f7085
cleanup
rjzamora Mar 13, 2025
4af6170
add comments to test
rjzamora Mar 13, 2025
3d83ea6
Merge branch 'branch-25.04' into complex-aggregations
rjzamora Mar 13, 2025
e4e98d9
bump test coverage
rjzamora Mar 13, 2025
31eec5f
Merge remote-tracking branch 'upstream/branch-25.04' into rapidsmp-sh…
rjzamora Mar 14, 2025
c1b47ac
test broadcast_join_limit
rjzamora Mar 14, 2025
615ead1
Merge remote-tracking branch 'upstream/branch-25.04' into rapidsmp-sh…
rjzamora Mar 14, 2025
cba0f1a
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 14, 2025
e92603a
Merge branch 'rapidsmp-shuffle' into cudf-polars-multi-combined
rjzamora Mar 14, 2025
255e2e2
Merge remote-tracking branch 'upstream/branch-25.04' into complex-agg…
rjzamora Mar 14, 2025
d1abd8d
partial code review
rjzamora Mar 14, 2025
0c3828c
remove unused code
rjzamora Mar 14, 2025
fd5b3e1
simplify _replace
rjzamora Mar 14, 2025
27e2184
update names
rjzamora Mar 14, 2025
58da5bf
fix sort error
rjzamora Mar 15, 2025
85597cf
Merge branch 'branch-25.04' into rapidsmp-shuffle
rjzamora Mar 15, 2025
f6c6a8f
Merge remote-tracking branch 'upstream/branch-25.04' into rapidsmp-sh…
rjzamora Mar 17, 2025
781b999
update testing to use LocalCUDACluster
rjzamora Mar 17, 2025
720c344
Merge remote-tracking branch 'upstream/branch-25.04' into cudf-polars…
rjzamora Mar 17, 2025
dd39d4b
Merge branch 'branch-25.04' into complex-aggregations
rjzamora Mar 17, 2025
a21f8de
add n_unique support
rjzamora Mar 18, 2025
84a20ab
Merge remote-tracking branch 'upstream/branch-25.04' into complex-agg…
rjzamora Mar 18, 2025
60a99d9
refactor shuffle component of 'n_unique'
rjzamora Mar 18, 2025
7b7f834
Merge remote-tracking branch 'upstream/branch-25.04' into complex-agg…
rjzamora Mar 18, 2025
3860e9d
Merge branch 'complex-aggregations' into cudf-polars-multi-combined
rjzamora Mar 18, 2025
5676199
temporarily drop coverage
rjzamora Mar 18, 2025
aab0d83
improve test coverage
rjzamora Mar 19, 2025
b6bce28
Merge remote-tracking branch 'upstream/branch-25.04' into complex-agg…
rjzamora Mar 19, 2025
64cc5e5
remove temporary hacks from tpch file
rjzamora Mar 19, 2025
2cb3b6d
Merge remote-tracking branch 'upstream/branch-25.06' into rapidsmp-sh…
rjzamora Mar 19, 2025
f6e5852
Apply suggestions from code review
rjzamora Mar 19, 2025
2802418
Merge remote-tracking branch 'upstream/branch-25.06' into complex-agg…
rjzamora Mar 19, 2025
442a96f
Merge remote-tracking branch 'upstream/branch-25.06' into cudf-polars…
rjzamora Mar 19, 2025
12768f0
Merge branch 'complex-aggregations' into cudf-polars-multi-combined
rjzamora Mar 19, 2025
a1cb8b1
Merge branch 'rapidsmp-shuffle' into cudf-polars-multi-combined
rjzamora Mar 19, 2025
6229730
type annotations
rjzamora Mar 19, 2025
52ac1e2
Merge branch 'complex-aggregations' into cudf-polars-multi-combined
rjzamora Mar 19, 2025
dfe6355
avoid using LocalRMPCluster when shuffle option is set to tasks
rjzamora Mar 20, 2025
026fa02
Merge remote-tracking branch 'upstream/branch-25.06' into cudf-polars…
rjzamora Mar 20, 2025
049ae7a
Merge branch 'branch-25.06' into cudf-polars-multi-combined
rjzamora Mar 21, 2025
0d4bc4f
drop localrmpcluster
rjzamora Mar 25, 2025
df4e30e
Merge remote-tracking branch 'upstream/branch-25.06' into cudf-polars…
rjzamora Mar 25, 2025
4980ad4
Merge remote-tracking branch 'upstream/branch-25.06' into cudf-polars…
rjzamora Mar 26, 2025
54de2a7
add trials
rjzamora Mar 26, 2025
44b6194
align with rapidsmp-172
rjzamora Mar 26, 2025
40aa64d
tweak default pool size
rjzamora Mar 26, 2025
0827eac
Merge remote-tracking branch 'upstream/branch-25.06' into cudf-polars…
rjzamora Mar 26, 2025
bcc2b24
remove unused changes
rjzamora Mar 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion python/cudf_polars/cudf_polars/dsl/expressions/base.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES.
# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES.
# SPDX-License-Identifier: Apache-2.0
# TODO: remove need for this
# ruff: noqa: D101
Expand All @@ -18,6 +18,8 @@
if TYPE_CHECKING:
from collections.abc import Mapping

from typing_extensions import Self

from cudf_polars.containers import Column, DataFrame

__all__ = ["AggInfo", "Col", "ColRef", "ExecutionContext", "Expr", "NamedExpr"]
Expand Down Expand Up @@ -237,6 +239,24 @@ def collect_agg(self, *, depth: int) -> AggInfo:
"""Collect information about aggregations in groupbys."""
return self.value.collect_agg(depth=depth)

def reconstruct(self, expr: Expr) -> Self:
"""
Rebuild with a new `Expr` value.

Parameters
----------
expr
New `Expr` value

Returns
-------
New `NamedExpr` with `expr` as the underlying expression.
The name of the original `NamedExpr` is preserved.
"""
if expr is self.value:
return self
return type(self)(self.name, expr)


class Col(Expr):
__slots__ = ("name",)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.
# SPDX-License-Identifier: Apache-2.0

"""Experimental benchmarks."""
Loading
Loading