Skip to content

Reference doc for ASOF JOIN TOLERANCE #195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 73 additions & 61 deletions documentation/concept/sql-optimizer-hints.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,58 +5,67 @@ description:
This document describes available hints and when to use them.
---

QuestDB's query optimizer automatically selects execution plans for SQL queries based on heuristics. The default
execution strategy should be the fastest for most datasets. You may use hints to select a specific execution strategy
which may (or may not) outperform the default strategy. SQL hints influence the execution strategy of queries without
changing their semantics.
QuestDB's query optimizer automatically selects execution plans for SQL queries based on heuristics. While the default
execution strategy should be the fastest for most scenarios, you can use hints to select a specific strategy that may
better suit your data's characteristics. SQL hints influence the execution strategy of queries without changing their
semantics.

## Hint Syntax

In QuestDB, SQL hints are specified as SQL block comments with a plus sign after the opening comment marker. Hints must
be placed immediately after the SELECT keyword:
be placed immediately after the `SELECT` keyword:

```questdb-sql title="SQL hint syntax"
SELECT /*+ HINT_NAME(parameter1 parameter2) */ columns FROM table;
```

Hints are entirely optional and designed to be a safe optimization mechanism:
Hints are designed to be a safe optimization mechanism:

- The database will use default optimization strategies when no hints are provided
- Syntax errors inside a hint block won't fail the entire SQL query
- The database safely ignores unknown hints
- Only block comment hints (`/*+ HINT */`) are supported, not line comment hints (`--+ HINT`)
- The database uses default optimization strategies when no hints are provided.
- Syntax errors inside a hint block won't fail the entire SQL query.
- The database safely ignores unknown hints.
- Only block comment hints (`/*+ HINT */`) are supported, not line comment hints (`--+ HINT`).

## Available Hints
-----

### USE_ASOF_BINARY_SEARCH
## Binary Search Optimizations and Hints

The `USE_ASOF_BINARY_SEARCH` hint enables a specialized binary search optimization for
non-keyed [ASOF joins](/reference/sql/asof-join/) when filtering is applied to the joined table. This hint requires two
parameters that specify the table aliases participating in the join.
Since version 9.0, QuestDB's optimizer defaults to using a binary search-based strategy for **`ASOF JOIN`** and
**`LT JOIN`** (Less Than Join) queries that have a filter on the right-hand side (the joined or lookup table). This
approach is generally faster as it avoids a full table scan.

```questdb-sql title="Optimizing ASOF join with binary search"
SELECT /*+ USE_ASOF_BINARY_SEARCH(orders md) */
However, for some specific data distributions and filter conditions, the previous strategy of performing a parallel full
table scan can be more performant. For these cases, QuestDB provides hints to *avoid* the default binary search.

### AVOID\_ASOF\_BINARY\_SEARCH and AVOID\_LT\_BINARY\_SEARCH

These hints instruct the optimizer to revert to the pre-9.0 execution strategy for `ASOF JOIN` and `LT JOIN` queries,
respectively. This older strategy involves performing a full parallel scan on the joined table to apply filters *before*
executing the join.

- `AVOID_ASOF_BINARY_SEARCH(left_table_alias right_table_alias)`: Use for **`ASOF JOIN`** queries.
- `AVOID_LT_BINARY_SEARCH(table_alias)`: Use for **`LT JOIN`** queries.

<!-- end list -->

```questdb-sql title="Avoiding binary search for an ASOF join"
SELECT /*+ AVOID_ASOF_BINARY_SEARCH(orders md) */
orders.ts, orders.price, md.md_ts, md.bid, md.ask
FROM orders
ASOF JOIN (
SELECT ts as md_ts, bid, ask FROM market_data
WHERE state = 'VALID' --filter on the joined table
WHERE state = 'INVALID' -- Highly selective filter
) md;
```

#### How it works

By default (without this hint), QuestDB processes ASOF joins by:

1. Applying filters to the joined table in parallel
2. Joining the filtered results to the main table

With the `USE_ASOF_BINARY_SEARCH` hint, QuestDB changes the execution strategy:
The **default strategy (binary search)** works as follows:

1. For each record in the main table, it uses [binary search](https://en.wikipedia.org/wiki/Binary_search) to locate
a record with a matching timestamp in the joined table
2. Starting from this located timestamp match, it then iterates backward through rows in the joined table, in a single
thread, until finding a row that matches the filter condition
1. For each record in the main table, it uses a binary search to quickly locate a record with a matching timestamp in
the joined table.
2. Starting from this located timestamp, it then iterates backward through rows in the joined table, in a single thread,
evaluating the filter condition until a match is found.

<Screenshot
alt="Diagram showing execution of the USE_ASOF_BINARY_SEARCH hint"
Expand All @@ -65,36 +74,39 @@ src="images/docs/concepts/asof-join-binary-search-strategy.svg"
width={745}
/>

#### When to use
The **hinted strategy (`AVOID_..._BINARY_SEARCH`)** forces this plan:

This optimization is particularly beneficial when:
1. Apply the filter to the *entire* joined table in parallel.
2. Join the filtered (and now much smaller) result set to the main table.

- The joined table is significantly larger than the main table
- The filter on the joined table has low selectivity (meaning it doesn't eliminate many rows)
- The joined table data is likely to be "cold" (not cached in memory)
#### When to use the AVOID hints

When joined table data is cold, the default strategy must read all rows from disk to evaluate the filter. This becomes
especially expensive on slower I/O systems like EBS (Elastic Block Storage). The binary search approach significantly
reduces I/O operations by reading only the specific portions of data needed for each join operation.
You should only need these hints in a specific scenario: when the filter on your joined table is **highly selective**.

However, when a filter is highly selective (eliminates most rows), the binary search strategy may be less efficient.
This happens because after finding a timestamp match, the strategy must iterate backward in a single thread, evaluating
the filter condition at each step until it finds a matching row. With highly selective filters, this sequential search
may need to examine many rows before finding a match.
A filter is considered highly selective if it eliminates a very large percentage of rows (e.g., more than 95%). In this
situation, the hinted strategy can be faster because:

As a rule of thumb, the binary search strategy tends to outperform the default strategy when the filter eliminates less
than 5% of rows from the joined table. However, optimal performance also depends on other factors such as the ratio
between main and joined table sizes, available hardware resources, disk I/O performance, and data distribution.
- The parallel pre-filtering step rapidly reduces the joined table to a very small size.
- The subsequent join operation is then very fast.

In contrast, the default strategy processes and filters the joined table in parallel, which can be much faster for
highly selective filters despite requiring an initial full table scan.
Conversely, the default binary search can be slower with highly selective filters because its single-threaded backward
scan may have to check many rows before finding one that satisfies the filter condition.

#### Execution Plan Observation
You can verify how QuestDB executes your query by examining its execution plan
with the [`EXPLAIN` statement](/reference/sql/explain/):
For most other cases, especially with filters that have low selectivity or when the joined table data is not in
memory ("cold"), the default binary search is significantly faster as it minimizes I/O operations.

```questdb-sql title="Observing execution plan with USE_ASOF_BINARY_SEARCH"
EXPLAIN SELECT /*+ USE_ASOF_BINARY_SEARCH(orders md) */
-----

### Execution Plan Observation

You can verify how QuestDB executes your query by examining its execution plan with the `EXPLAIN` statement.

#### Default Execution Plan (Binary Search)

Without any hints, a filtered `ASOF JOIN` will use the binary search strategy.

```questdb-sql title="Observing the default execution plan"
EXPLAIN SELECT
orders.ts, orders.price, md.md_ts, md.bid, md.ask
FROM orders
ASOF JOIN (
Expand All @@ -103,18 +115,20 @@ ASOF JOIN (
) md;
```

When the hint is applied successfully, the execution plan will show a Filtered AsOf Join Fast Scan operator,
confirming that the binary search strategy is being used:
The execution plan will show a `Filtered AsOf Join Fast Scan` operator, confirming the binary search strategy is being
used.

<Screenshot
alt="Screen capture of the EXPLAIN output for USE_ASOF_BINARY_SEARCH"
alt="Screen capture of the EXPLAIN output showing the default Filtered AsOf Join Fast Scan"
src="images/docs/concepts/filtered-asof-plan-example.png"
/>

For comparison, here's what happens without the hint:
#### Hinted Execution Plan (Full Scan)

When you use the `AVOID_ASOF_BINARY_SEARCH` hint, the plan changes.

```questdb-sql title="Observing execution plan without USE_ASOF_BINARY_SEARCH"
EXPLAIN SELECT
```questdb-sql title="Observing execution plan with the AVOID hint"
EXPLAIN SELECT /*+ AVOID_ASOF_BINARY_SEARCH(orders md) */
orders.ts, orders.price, md.md_ts, md.bid, md.ask
FROM orders
ASOF JOIN (
Expand All @@ -123,12 +137,10 @@ ASOF JOIN (
) md;
```

The execution plan will show:

- A standard `AsOf Join` operator instead of `Filtered AsOf Join Fast Scan`
- A separate filtering step that processes the joined table in parallel first
The execution plan will now show a standard `AsOf Join` operator and a separate, preceding filtering step on the joined
table.

<Screenshot
alt="Screen capture of the EXPLAIN output for default ASOF join"
alt="Screen capture of the EXPLAIN output for the hinted ASOF join, showing a separate filter"
src="images/docs/concepts/default-asof-plan-example.png"
/>
82 changes: 78 additions & 4 deletions documentation/reference/sql/asof-join.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,14 @@ Visualized, a JOIN operation looks like this:
for more information.

- `joinClause` `ASOF JOIN` with an optional `ON` clause which allows only the
`=` predicate:
`=` predicate and an optional `TOLERANCE` clause:

![Flow chart showing the syntax of the ASOF, LT, and SPLICE JOIN keyword](/images/docs/diagrams/AsofLtSpliceJoin.svg)
![Flow chart showing the syntax of the ASOF, LT, and SPLICE JOIN keyword](/images/docs/diagrams/AsofJoin.svg)

- `whereClause` - see the [WHERE](/docs/reference/sql/where/) reference docs for
more information.

In addition, the following are items of import:
In addition, the following are items of importance:

- Columns from joined tables are combined in a single row.

Expand All @@ -67,6 +67,9 @@ logic: for each row in the first time-series,
1. consider all timestamps in the second time-series **earlier or equal to**
the first one
2. choose **the latest** such timestamp
3. If the optional `TOLERANCE` clause is specified, an additional condition applies:
the chosen record from t2 must satisfy `t1.ts - t2.ts <= tolerance_value`. If no record
from t2 meets this condition (along with `t2.ts <= t1.ts`), then the row from t1 will not have a match.

### Example

Expand Down Expand Up @@ -151,7 +154,7 @@ Let's use an example with two tables:
We want to join each trade event to the relevant order book snapshot. All
we have to write is

```questdb-sql title="A basic ASOF JOIN example" demo
```questdb-sql title="A basic ASOF JOIN example"
trades ASOF JOIN order_book
```

Expand Down Expand Up @@ -327,6 +330,77 @@ Result:

</div>

### TOLERANCE clause

The `TOLERANCE` clause enhances ASOF and LT JOINs by limiting how far back in time the join should look for a match in the right
table. The `TOLERANCE` parameter accepts a time interval value (e.g., 2s, 100ms, 1d).

When specified, a record from the left table t1 at t1.ts will only be joined with a record from the right table t2 at
t2.ts if both conditions are met: `t2.ts <= t1.ts` and `t1.ts - t2.ts <= tolerance_value`

This ensures that the matched record from the right table is not only the latest one on or before t1.ts, but also within
the specified time window.

```questdb-sql title="ASOF JOIN with a TOLERANCE parameter"
SELECT ...
FROM table1
ASOF JOIN table2 TOLERANCE 10s
[WHERE ...]
```

TOLERANCE also works together with the ON clause:
```questdb-sql title="ASOF JOIN with keys and a TOLERANCE parameter"
SELECT ...
FROM table1
ASOF JOIN table2 ON (key_column) TOLERANCE 1m
[WHERE ...]
```

The interval_literal must be a valid QuestDB interval string, like '5s' (5 seconds), '100ms' (100 milliseconds), '2m' (
2 minutes), '3h' (3 hours), or '1d' (1 day).


#### Example using TOLERANCE:

Consider the `trades` and `order_book` tables from the previous examples. If we want to join trades to order book snapshots
that occurred no more than 1 second before the trade:

```questdb-sql title="TOLERANCE example"
SELECT t.timestamp, t.price, t.size, ob.timestamp AS ob_ts, ob.bid_price, ob.bid_size
FROM trades t
ASOF JOIN order_book ob TOLERANCE 1s;
```

Let's analyze a specific trade: trades at `08:00:01.146931`.
Without `TOLERANCE`, it joins with `order_book` at `08:00:01`. The time difference is 0.146931s.
If we set `TOLERANCE` '100ms', this trade would not find a match, because 0.146931s (146.931ms) is greater than 100ms. The
previous `order_book` entry at `08:00:00` would be even further away (1.146931s).

Another trade: trades at `08:00:00.007140`.
Without `TOLERANCE`, it joins with order_book at `08:00:00`. The time difference is 0.007140s (7.14ms).
If we set `TOLERANCE` '5ms', this trade would not find a match because 7.14ms > 5ms.

#### Supported Units for interval_literal
The `TOLERANCE` interval literal supports the following time unit qualifiers:
- U: Microseconds
- T: Milliseconds
- s: Seconds
- m: Minutes
- h: Hours
- d: Days
- w: Weeks

For example, '100U' is 100 microseconds, '50T' is 50 milliseconds, '2s' is 2 seconds, '30m' is 30 minutes,
'1h' is 1 hour, '7d' is 7 days, and '2w' is 2 weeks. Please note that months (M) and years (Y) are not supported as
units for the `TOLERANCE` clause.

#### Performance impact of TOLERANCE

Specifying `TOLERANCE` can also improve performance. `ASOF JOIN` execution plans often scan backward in time on the right
table to find a matching entry for each left-table row. `TOLERANCE` allows these scans to terminate early - once a
right-table record is older than the left-table record by more than the specified tolerance - thus avoiding unnecessary
processing of more distant records.

### Timestamp considerations

`ASOF` join can be performed only on tables or result sets that are ordered by
Expand Down
41 changes: 39 additions & 2 deletions documentation/reference/sql/join.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ High-level overview:
![Flow chart showing the syntax of the INNER, LEFT JOIN keyword](/images/docs/diagrams/InnerLeftJoin.svg)

- `ASOF`, `LT`, and `SPLICE` `JOIN` has optional `ON` clause allowing only the
`=` predicate:
`=` predicate.
- `ASOF` and `LT` join additionally allows an optional `TOLERANCE` clause:

![Flow chart showing the syntax of the ASOF, LT, and SPLICE JOIN keyword](/images/docs/diagrams/AsofLtSpliceJoin.svg)

Expand Down Expand Up @@ -311,7 +312,7 @@ WHERE t.timestamp < t2.timestamp

## LT JOIN

Similar to `ASOF JOIN`, `LT JOIN` joins two different time-series measured. For
Similar to [`ASOF JOIN`](/docs/reference/sql/asof-join/), `LT JOIN` joins two different time-series measured. For
each row in the first time-series, the `LT JOIN` takes from the second
time-series a timestamp that meets both of the following criteria:

Expand Down Expand Up @@ -394,6 +395,42 @@ order to get preceding values for every row.
The `ON` clause can also be used in combination with `LT JOIN` to join both by
timestamp and column values.

### TOLERANCE clause
The `TOLERANCE` clause enhances LT JOIN by limiting how far back in time the join should look for a match in the right
table. The `TOLERANCE` parameter accepts a time interval value (e.g., 2s, 100ms, 1d).

When specified, a record from the left table t1 at t1.ts will only be joined with a record from the right table t2 at
t2.ts if both conditions are met: `t2.ts < t1.ts` and `t1.ts - t2.ts <= tolerance_value`

This ensures that the matched record from the right table is not only the latest one on or before t1.ts, but also within
the specified time window.

```questdb-sql title="LT JOIN with a TOLERANCE parameter"
SELECT ...
FROM table1
LT JOIN table2 TOLERANCE 10s
[WHERE ...]
```

The interval_literal must be a valid QuestDB interval string, like '5s' (5 seconds), '100ms' (100 milliseconds),
'2m' ( 2 minutes), '3h' (3 hours), or '1d' (1 day).

#### Supported Units for interval_literal
The `TOLERANCE` interval literal supports the following time unit qualifiers:
- U: Microseconds
- T: Milliseconds
- s: Seconds
- m: Minutes
- h: Hours
- d: Days
- w: Weeks

For example, '100U' is 100 microseconds, '50T' is 50 milliseconds, '2s' is 2 seconds, '30m' is 30 minutes,
'1h' is 1 hour, '7d' is 7 days, and '2w' is 2 weeks. Please note that months (M) and years (Y) are not supported as
units for the `TOLERANCE` clause.

See [`ASOF JOIN documentation`](/docs/reference/sql/asof-join#tolerance-clause) for more examples with the `TOLERANCE` clause.

## SPLICE JOIN

`SPLICE JOIN` is a full `ASOF JOIN`. It will return all the records from both
Expand Down
9 changes: 7 additions & 2 deletions static/images/docs/diagrams/.railroad
Original file line number Diff line number Diff line change
Expand Up @@ -238,8 +238,13 @@ joinOverview
innerLeftJoin
::= ( 'INNER' | 'LEFT' )? 'JOIN' ( table | '(' sub-query ')' ) ( 'ON' ( ( column operator anotherColumn ) ( 'AND' ( column operator anotherColumn ) )* | '(' column ( ',' column )* ')' ) )?

aAsofLtSpliceJoin
::= ( 'ASOF' | 'LT' | 'SPLICE' ) 'JOIN' ( table | '(' sub-query ')' ) ( 'ON' ( column '=' anotherColumn ( 'AND' column '=' anotherColumn )* | '(' column ( ',' column )* ')' ) )?
AsofLtSpliceJoin
::= ( 'ASOF' | 'LT' ) 'JOIN' ( table | '(' sub-query ')' ) ( 'ON' ( column '=' anotherColumn ( 'AND' column '=' anotherColumn )* | '(' column ( ',' column )* ')' ) )? ( 'TOLERANCE' interval_literal )?
| 'SPLICE' 'JOIN' ( table | '(' sub-query ')' ) ( 'ON' ( column '=' anotherColumn ( 'AND' column '=' anotherColumn )* | '(' column ( ',' column )* ')' ) )?

AsofJoin
::= 'ASOF' 'JOIN' ( table | '(' sub-query ')' ) ( 'ON' ( column '=' anotherColumn ( 'AND' column '=' anotherColumn )* | '(' column ( ',' column )* ')' ) )? ( 'TOLERANCE' interval_literal )?


crossJoin
::= 'CROSS' 'JOIN' ( table | '(' sub-query ')' )
Expand Down
Loading