You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
Add a mode that outputs selection vectors (for now let's use dense boolean arrays so it can be added to RecordBatch) in RepartitionExec. The array outputs true for each row that has hash % total_partition == current_partition (and false if not).
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
After reviewing the related issues, I'm very excited about these features. I'd like to keep a close eye on their implementation, as I feel I can learn a lot from them 🥰
The array outputs true for each row that has hash % partition == 0 (and false if not).
I don't understand why the formula is hash % partition == 0? IMO, hash % total_partition is the number of the portion it belongs. Maybe the formula should be hash % total_partition == current_partition?
Given the following data:
col1 | col2 | ... | hash % total_partition
-------------------------
data | data | ... | 2
data | data | ... | 1
data | data | ... | 2
data | data | ... | 0
The 0 partition will get
col1 | col2 | ... | selection
-------------------------
data | data | ... | false
data | data | ... | false
data | data | ... | false
data | data | ... | true
The 1 partition will get
col1 | col2 | ... | selection
-------------------------
data | data | ... | false
data | data | ... | true
data | data | ... | false
data | data | ... | false
The 2 partition will get
col1 | col2 | ... | selection
-------------------------
data | data | ... | true
data | data | ... | false
data | data | ... | true
data | data | ... | false
Then, the following plan can aggregate or join the record which selection is true.
Does it make sense?
Is your feature request related to a problem or challenge?
Add a mode that outputs selection vectors (for now let's use dense boolean arrays so it can be added to RecordBatch) in RepartitionExec. The array outputs true for each row that has hash % total_partition == current_partition (and false if not).
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: