-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Remote UDF, only serialize and process selected rows #12277
base: main
Are you sure you want to change the base?
Conversation
This pull request was exported from Phabricator. Differential Revision: D69231717 |
✅ Deploy Preview for meta-velox canceled.
|
…ncubator#12277) Summary: - In this diff, we are adding an optimization to Remote UDF, to only serialize and process selected rows. - This could greatly help in cases where the vector is heavily filtered. - The previous approach could also potentially have bugs where function would throw exception if it doesn't expect to process un-selected rows (e.g null conditions, unexpected arguments, etc.) - XStream is starting to adopt this model for the CodeX project. This is to unblock dogfooding, and productionizing. Differential Revision: D69231717
31959db
to
29cb65c
Compare
This pull request was exported from Phabricator. Differential Revision: D69231717 |
Summary: - In this diff, we are adding an optimization to Remote UDF, to only serialize and process selected rows. - This could greatly help in cases where the vector is heavily filtered. - The previous approach could also potentially have bugs where function would throw exception if it doesn't expect to process un-selected rows (e.g null conditions, unexpected arguments, etc.) - XStream is starting to adopt this model for the CodeX project. This is to unblock dogfooding, and productionizing. Differential Revision: D69231717
This pull request was exported from Phabricator. Differential Revision: D69231717 |
…ncubator#12277) Summary: - In this diff, we are adding an optimization to Remote UDF, to only serialize and process selected rows. - This could greatly help in cases where the vector is heavily filtered. - The previous approach could also potentially have bugs where function would throw exception if it doesn't expect to process un-selected rows (e.g null conditions, unexpected arguments, etc.) - XStream is starting to adopt this model for the CodeX project. This is to unblock dogfooding, and productionizing. Differential Revision: D69231717
29cb65c
to
2944381
Compare
This pull request was exported from Phabricator. Differential Revision: D69231717 |
Nice, thanks for looking into this! Instead of wrapping the entire vector in a dictionary, a more efficient way would be to keep the vector as-is, but only serialize the rows active in the rowSet. That API doesn't do that today, but it would be better here to create a new version of rowVectorToIOBuf() in VectorStream.h that takes the rowSet and only serializes the rows alive. Inside that function, you will only have to convert the rowSet to a vector of IndexRange, and call the appropriate code, so it would actually be less code. There may be an utility already in the codebase to transform a rowSet into a vector of ranges. |
Summary:
Differential Revision: D69231717