Skip to content

Commit

Permalink
Implementation for ReferenceRel (#284)
Browse files Browse the repository at this point in the history
* feat: introducing a ReferenceRel operator to express DAG/Spool. (#283)

Co-authored-by: Jacques Nadeau <[email protected]>
  • Loading branch information
curino and jacques-n authored Aug 11, 2022
1 parent 8af3fe0 commit 70736ee
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 1 deletion.
7 changes: 6 additions & 1 deletion proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -368,7 +368,6 @@ message Rel {
message NamedObjectWrite {
// The list of string is used to represent namespacing (e.g., mydb.mytable).
// This assumes shared catalog between systems exchanging a message.
//
repeated string names = 1;
substrait.extensions.AdvancedExtension advanced_extension = 10;
}
Expand Down Expand Up @@ -1164,4 +1163,10 @@ message AggregateFunction {
// Use only distinct values in the aggregation calculation.
AGGREGATION_INVOCATION_DISTINCT = 2;
}

// This rel is used to create references,
// in case we refer to a RelRoot field names will be ignored
message ReferenceRel {
int32 subtree_ordinal = 1;
}
}
31 changes: 31 additions & 0 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,37 @@ If at least one grouping expression is present, the aggregation is allowed to no
%%% proto.algebra.AggregateRel %%%
```

## Reference Operator

The reference operator is used to construct DAGs of operations. In a `Plan` we can have multiple Rel representing various
computations with potentially multiple outputs. The `ReferenceRel` is used to express the fact that multiple `Rel` might be
sharing subtrees of computation. This can be used to express arbitrary DAGs as well as represent multi-query optimizations.

As a concrete example think about two queries `SELECT * FROM A JOIN B JOIN C` and `SELECT * FROM A JOIN B JOIN D`,
We could use the `ReferenceRel` to highlight the shared `A JOIN B` between the two queries, by creating a plan with 3 `Rel`.
One expressing `A JOIN B` (in position 0 in the plan), one using reference as follows: `ReferenceRel(0) JOIN C` and a third one
doing `ReferenceRel(0) JOIN D`. This allows to avoid the redundancy of `A JOIN B`.

| Signature | Value |
| -------------------- |---------------------------------------|
| Inputs | 1 |
| Outputs | 1 |
| Property Maintenance | Maintains all properties of the input |
| Direct Output Order | Maintains order |


### Reference Properties

| Property | Description | Required |
|-----------------------------|--------------------------------------------------------------------------------| --------------------------- |
| Referred Rel | A zero-indexed positional reference to a `Rel` defined within the same `Plan`. | Required |

=== "ReferenceRel Message"

```proto
%%% proto.algebra.ReferenceRel %%%
```

## Write Operator

The write operator is an operator that consumes one output and writes it to storage. This can range from writing to a Parquet file, to INSERT/DELETE/UPDATE in a database.
Expand Down

0 comments on commit 70736ee

Please sign in to comment.