Skip to content

Commit

Permalink
feat: introducing a ReferenceRel operator to express DAG/Spool. (subs…
Browse files Browse the repository at this point in the history
  • Loading branch information
curino committed Aug 9, 2022
1 parent 6461d30 commit 0dcbe8a
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 1 deletion.
6 changes: 6 additions & 0 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -1052,4 +1052,10 @@ message AggregateFunction {
// Use only distinct values in the aggregation calculation.
AGGREGATION_INVOCATION_DISTINCT = 2;
}

// This rel is used to create references,
// in case we refer to a RelRoot field names will be ignored
message ReferenceRel {
int32 subtree_ordinal = 1;
}
}
31 changes: 30 additions & 1 deletion site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,36 @@ If at least one grouping expression is present, the aggregation is allowed to no
%%% proto.algebra.AggregateRel %%%
```

## Reference Operator

The reference operator is used to construct DAGs of operations. In a Plan we can have multiple Rel representing various
computations with potentially multiple outputs. The ReferenceRel is used to express the fact that multiple Rel might be
sharing subtrees of computation. This can be used to express arbitrary DAGs as well as represent multi-query optimizations.

As a concrete example think about two queries SELECT * FROM A JOIN B JOIN C and SELECT * FROM A JOIN B JOIN D,
We could use the ReferenceRel to highlight the shared A JOIN B between the two queries, by creating a plan with 3 Rel.
One expressing <A JOIN B> (in position 0 in the plan), one using reference as follows: <ReferenceRel(0) JOIN C> and a third one
doing <ReferenceRel(0) JOIN D>. This allows to avoid the redundancy of A JOIN B.

| Signature | Value |
| -------------------- | --------------- |
| Inputs | 1 |
| Outputs | 0 |
| Property Maintenance | N/A (no output) |
| Direct Output Order | N/A (no output) |


### Reference Properties

| Property | Description | Required |
|-----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| --------------------------- |
| Referred Rel | A positional reference to a Rel defined within the same Plan. | Required |

=== "ReferenceRel Message"

```proto
%%% proto.algebra.ReferenceRel %%%
```

## Write Operator

Expand Down Expand Up @@ -353,7 +383,6 @@ Write definition types are built by the community and added to the specification
| Format | Enumeration of available formats. Only current option is PARQUET. | Required |



## Discussion Points

* How to handle correlated operations?

0 comments on commit 0dcbe8a

Please sign in to comment.