Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYSTEMDS-3421] Adds missing read/writeExternal to ColumnEncoderUDF. #1681

Closed
wants to merge 1 commit into from

Conversation

paginabianca
Copy link
Contributor

The read/writeExternal functions for ColumnEncoderUDF where missing.
This also adds the necessary switch case in the EncoderFactory
and a test that currently does nothing.

I encountered this bug when running the FTBench T2 in federated mode with two actual workers. The main issue is that the ColumnEncoderUDF objects on the workers end up with null as _fName because the function name is not sent.

@mboehm7

The read/writeExternal functions for ColumnEncoderUDF where missing.
This also adds the necessary switch case in the EncoderFactory
and a test that currently does nothing.
@phaniarnab
Copy link
Contributor

There are two versions of T2 (KDD dataset) -- one with UDF-based scaling and another with scaling outside of transformencode. The UDF task implementation is still in the experimental phase. Internally it uses our eval mechanism. I am not very sure how would that fit in the federated setup.
But the other implementation should work out of the box in federated.

@mboehm7
Copy link
Contributor

mboehm7 commented Aug 18, 2022

That is a good point @phaniarnab - the UDF column encoder would not be valid for federated row-partitions we currently work with for the other benchmark use cases. However, for column-partitioned federated data it should work.

Baunsgaard pushed a commit to Baunsgaard/systemds that referenced this pull request Oct 31, 2022
The read/writeExternal functions for ColumnEncoderUDF where missing.
This also adds the necessary switch case in the EncoderFactory
and a test that currently does nothing.

Closes apache#1681
@Baunsgaard
Copy link
Contributor

Fixed the syntax error and opened in new PR.
#1716

@Baunsgaard Baunsgaard closed this Oct 31, 2022
Baunsgaard pushed a commit to Baunsgaard/systemds that referenced this pull request Nov 11, 2022
The read/writeExternal functions for ColumnEncoderUDF where missing.
This also adds the necessary switch case in the EncoderFactory
and a test that currently does nothing.

Closes apache#1681
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants