Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(proto): add write relation to support multiple outputs #239

Closed
wants to merge 1 commit into from

Conversation

rtpsw
Copy link

@rtpsw rtpsw commented Jul 3, 2022

See #238

@rtpsw
Copy link
Author

rtpsw commented Jul 3, 2022

cc @icexelloss

@jacques-n
Copy link
Contributor

#236 is focused on write rel. Let's get that merged first. Would be good to get your feedback/thoughts on the direction there.

@rtpsw
Copy link
Author

rtpsw commented Jul 4, 2022

#236 is focused on write rel. Let's get that merged first. Would be good to get your feedback/thoughts on the direction there.

@jacques-n, thanks for pointing me to this PR. I'll post there.

@icexelloss
Copy link
Contributor

@rtpsw Can you remind me why we need this again?

@rtpsw
Copy link
Author

rtpsw commented Jul 5, 2022

@rtpsw Can you remind me why we need this again?

See issue description. Though it's not on our critical path, it has been useful in my local work, and I expect it would be useful more generally. It's possible this issue would get merged into #236 .

@jvanstraten jvanstraten changed the title feat(proto): Add Write relation to support multiple outputs feat(proto): add write relation to support multiple outputs Jul 26, 2022
@cpcloud
Copy link
Contributor

cpcloud commented Aug 11, 2022

@rtpsw Is this superseded by #252?

@jacques-n
Copy link
Contributor

jacques-n commented Aug 11, 2022

This is superseded by #284 + #252. Closing. Let us know if we missed anything.

Thanks!

@jacques-n jacques-n closed this Aug 11, 2022
@rtpsw
Copy link
Author

rtpsw commented Sep 13, 2022

Sorry for the late response. I think #284 + #252 should conceptually work.

@jacques-n, could you explain how the use case I'm targeting here would map to Substrait messages, in particular using ReferenceRel from #284?

@rtpsw rtpsw deleted the GH-238 branch September 13, 2022 07:58
@jvanstraten
Copy link
Contributor

I'm not sure, but isn't the current idea that a plan should only have one root relation? Nothing in the protos prevents multiple root relations and I don't think the website says anything about it, but if that's the case then that technically disallows the use case you put forward, and a minor spec change would still be required. Personally I don't see why this should be disallowed; if a consumer doesn't want to support multiple roots then it can just reject a plan with multiple in it.

Otherwise, a WriteRel that passes all data through is not necessary, because you could write the hypothetical sink(WriteRel(source)) like this:

  • relation 0: source
  • relation 1 (root?): WriteRel(ReferenceRel(0))
  • relation 2 (root): sink(ReferenceRel(0))

The actual new relations are not part of the oneof in Rel yet, that's a bug that's being addressed in #288. Also, WriteRel as currently defined isn't actually the complementary operation of ReadRel. You mention "check-point and debug output," so I imagine you're also interested in file sinks, which currently fundamentally don't work because WriteRel is more of a ReadModifyWriteRel or UpdateRel than a strict write relation, and thus file output was conveniently left out.

@rtpsw
Copy link
Author

rtpsw commented Sep 14, 2022

We should distinguish between the use case, which is additional outputs for check-point and debug purposes, from the draft proposal in this PR. While I'm no longer requesting the current PR, due to the other PRs described above, I'm still interested in the use case (though not at a high priority). I think the use case is not handled until a way to get check-point and debug output is spelled out. So, my question stands and #238 should be left open.

I imagine you're also interested in file sinks, which currently fundamentally don't work because WriteRel is more of a ReadModifyWriteRel or UpdateRel than a strict write relation, and thus file output was conveniently left out.

I agree a solution for check-point and debug purposes could be based on files. However, I don't think the file has to be a sink. The update or read-modify-write operation fits a random-access/re-writable (relational) file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants