Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose stream-ordering in scalar and avro APIs #17766

Merged
merged 16 commits into from
Jan 29, 2025

Conversation

shrshi
Copy link
Contributor

@shrshi shrshi commented Jan 17, 2025

Description

Contributes to #13744

Replaces conversion operators in derived classes of cudf::scalar with stream-ordered get_value(stream) member function.
Adds stream parameter to cudf::io::read_avro

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Jan 17, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 17, 2025
@shrshi shrshi added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 17, 2025
@github-actions github-actions bot added the CMake CMake build issue label Jan 18, 2025
@shrshi shrshi marked this pull request as ready for review January 18, 2025 00:22
@shrshi shrshi requested review from a team as code owners January 18, 2025 00:22
@shrshi shrshi requested review from vyasr and davidwendt January 18, 2025 00:22
*/
explicit operator value_type() const;
T get_value(rmm::cuda_stream_view stream = cudf::get_default_stream()) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RMM just calls this value. We often try to avoid “get” in our function names.

Suggested change
T get_value(rmm::cuda_stream_view stream = cudf::get_default_stream()) const;
T value(rmm::cuda_stream_view stream = cudf::get_default_stream()) const;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, doesn’t value already exist just below here? We may not need get_value at all. Maybe we just need to delete the conversion operator since it does not take a stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I chose to introduce another function here because value can return either host or device data depending on the type. string_scalar::value returns a cudf::string_view on the device, while fixed_point::value and fixed_width::value copy to host. I was also unsure of using fixed_point::value directly since the conversion operator returns the fixed_point_value instead of the unscaled value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to modify string_scalar::value to return a host string view since it is used in several places that directly access the scalar on device. One option is to rename this function to string_scalar::d_value and introduce another string_scalar::value that returns a std::string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string_scalar::value() should return a cudf::string_view which is possible from host or device since it just wraps a pointer and a size held by the string_scalar. No device code is needed. Actually the stream parameter is not used at all.
To return a host std::string one would use the string_scalar::to_string() function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Bradley we should not have the get_value() functions and just keep the value() ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David, Bradley. I've retained the value functions and got rid of the conversion operators.

cpp/include/cudf/scalar/scalar.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/scalar/scalar.hpp Outdated Show resolved Hide resolved
@@ -81,7 +81,7 @@ void ASSERT_BINOP(cudf::column_view const& out,
TypeOp&& op,
ValueComparator const& value_comparator = ValueComparator())
{
auto lhs_h = static_cast<ScalarType const&>(lhs).operator TypeLhs();
auto lhs_h = static_cast<ScalarType const&>(lhs).get_value(cudf::get_default_stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor the tests like this:

Suggested change
auto lhs_h = static_cast<ScalarType const&>(lhs).get_value(cudf::get_default_stream());
auto lhs_h = static_cast<ScalarType const&>(lhs).value(cudf::get_default_stream());

Copy link
Contributor Author

@shrshi shrshi Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a scalar_host_value function that checks which scalar class is passed since the conversion operator in string_scalar and fixed_point_scalar called to_string and fixed_point_value respectively.

@shrshi shrshi requested a review from bdice January 24, 2025 20:09
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One style request, otherwise LGTM!

cpp/tests/binaryop/assert-binops.h Outdated Show resolved Hide resolved
@vuule vuule self-requested a review January 27, 2025 18:09
@shrshi shrshi added breaking Breaking change and removed non-breaking Non-breaking change labels Jan 27, 2025
@galipremsagar
Copy link
Contributor

@shrshi I cancelled the most recent workflow to free up resources to unblock all of cudf CI for this PR: #17771

I'll rerun once #17771 is merged.

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just one typo to fix

cpp/tests/streams/scalar_test.cpp Outdated Show resolved Hide resolved
@vyasr vyasr added the DO NOT MERGE Hold off on merging; see PR for details label Jan 28, 2025
@vyasr
Copy link
Contributor

vyasr commented Jan 28, 2025

Marking as DO NOT MERGE until we've had some discussion with users.

@nvdbaranec
Copy link
Contributor

nvdbaranec commented Jan 29, 2025

I'm having issues building spark-rapids-jni with this branch - cmake related stuff. Would it be possible to get an up to date merge of 25.02 pulled into here?

@nvdbaranec
Copy link
Contributor

All spark tests pass. Good to go.

@shrshi shrshi removed the DO NOT MERGE Hold off on merging; see PR for details label Jan 29, 2025
@shrshi
Copy link
Contributor Author

shrshi commented Jan 29, 2025

/merge

@rapids-bot rapids-bot bot merged commit aa80d45 into rapidsai:branch-25.02 Jan 29, 2025
107 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.
Projects
Status: Landed
Development

Successfully merging this pull request may close these issues.

7 participants