diff --git a/documentation/concept/sql-optimizer-hints.md b/documentation/concept/sql-optimizer-hints.md index 2ee46281..dc32a4ae 100644 --- a/documentation/concept/sql-optimizer-hints.md +++ b/documentation/concept/sql-optimizer-hints.md @@ -5,58 +5,67 @@ description: This document describes available hints and when to use them. --- -QuestDB's query optimizer automatically selects execution plans for SQL queries based on heuristics. The default -execution strategy should be the fastest for most datasets. You may use hints to select a specific execution strategy -which may (or may not) outperform the default strategy. SQL hints influence the execution strategy of queries without -changing their semantics. +QuestDB's query optimizer automatically selects execution plans for SQL queries based on heuristics. While the default +execution strategy should be the fastest for most scenarios, you can use hints to select a specific strategy that may +better suit your data's characteristics. SQL hints influence the execution strategy of queries without changing their +semantics. ## Hint Syntax In QuestDB, SQL hints are specified as SQL block comments with a plus sign after the opening comment marker. Hints must -be placed immediately after the SELECT keyword: +be placed immediately after the `SELECT` keyword: ```questdb-sql title="SQL hint syntax" SELECT /*+ HINT_NAME(parameter1 parameter2) */ columns FROM table; ``` -Hints are entirely optional and designed to be a safe optimization mechanism: +Hints are designed to be a safe optimization mechanism: -- The database will use default optimization strategies when no hints are provided -- Syntax errors inside a hint block won't fail the entire SQL query -- The database safely ignores unknown hints -- Only block comment hints (`/*+ HINT */`) are supported, not line comment hints (`--+ HINT`) +- The database uses default optimization strategies when no hints are provided. +- Syntax errors inside a hint block won't fail the entire SQL query. +- The database safely ignores unknown hints. +- Only block comment hints (`/*+ HINT */`) are supported, not line comment hints (`--+ HINT`). -## Available Hints +----- -### USE_ASOF_BINARY_SEARCH +## Binary Search Optimizations and Hints -The `USE_ASOF_BINARY_SEARCH` hint enables a specialized binary search optimization for -non-keyed [ASOF joins](/reference/sql/asof-join/) when filtering is applied to the joined table. This hint requires two -parameters that specify the table aliases participating in the join. +Since version 9.0, QuestDB's optimizer defaults to using a binary search-based strategy for **`ASOF JOIN`** and +**`LT JOIN`** (Less Than Join) queries that have a filter on the right-hand side (the joined or lookup table). This +approach is generally faster as it avoids a full table scan. -```questdb-sql title="Optimizing ASOF join with binary search" -SELECT /*+ USE_ASOF_BINARY_SEARCH(orders md) */ +However, for some specific data distributions and filter conditions, the previous strategy of performing a parallel full +table scan can be more performant. For these cases, QuestDB provides hints to *avoid* the default binary search. + +### AVOID\_ASOF\_BINARY\_SEARCH and AVOID\_LT\_BINARY\_SEARCH + +These hints instruct the optimizer to revert to the pre-9.0 execution strategy for `ASOF JOIN` and `LT JOIN` queries, +respectively. This older strategy involves performing a full parallel scan on the joined table to apply filters *before* +executing the join. + +- `AVOID_ASOF_BINARY_SEARCH(left_table_alias right_table_alias)`: Use for **`ASOF JOIN`** queries. +- `AVOID_LT_BINARY_SEARCH(table_alias)`: Use for **`LT JOIN`** queries. + + + +```questdb-sql title="Avoiding binary search for an ASOF join" +SELECT /*+ AVOID_ASOF_BINARY_SEARCH(orders md) */ orders.ts, orders.price, md.md_ts, md.bid, md.ask FROM orders ASOF JOIN ( SELECT ts as md_ts, bid, ask FROM market_data - WHERE state = 'VALID' --filter on the joined table + WHERE state = 'INVALID' -- Highly selective filter ) md; ``` #### How it works -By default (without this hint), QuestDB processes ASOF joins by: - -1. Applying filters to the joined table in parallel -2. Joining the filtered results to the main table - -With the `USE_ASOF_BINARY_SEARCH` hint, QuestDB changes the execution strategy: +The **default strategy (binary search)** works as follows: -1. For each record in the main table, it uses [binary search](https://en.wikipedia.org/wiki/Binary_search) to locate - a record with a matching timestamp in the joined table -2. Starting from this located timestamp match, it then iterates backward through rows in the joined table, in a single - thread, until finding a row that matches the filter condition +1. For each record in the main table, it uses a binary search to quickly locate a record with a matching timestamp in + the joined table. +2. Starting from this located timestamp, it then iterates backward through rows in the joined table, in a single thread, + evaluating the filter condition until a match is found. -#### When to use +The **hinted strategy (`AVOID_..._BINARY_SEARCH`)** forces this plan: -This optimization is particularly beneficial when: +1. Apply the filter to the *entire* joined table in parallel. +2. Join the filtered (and now much smaller) result set to the main table. -- The joined table is significantly larger than the main table -- The filter on the joined table has low selectivity (meaning it doesn't eliminate many rows) -- The joined table data is likely to be "cold" (not cached in memory) +#### When to use the AVOID hints -When joined table data is cold, the default strategy must read all rows from disk to evaluate the filter. This becomes -especially expensive on slower I/O systems like EBS (Elastic Block Storage). The binary search approach significantly -reduces I/O operations by reading only the specific portions of data needed for each join operation. +You should only need these hints in a specific scenario: when the filter on your joined table is **highly selective**. -However, when a filter is highly selective (eliminates most rows), the binary search strategy may be less efficient. -This happens because after finding a timestamp match, the strategy must iterate backward in a single thread, evaluating -the filter condition at each step until it finds a matching row. With highly selective filters, this sequential search -may need to examine many rows before finding a match. +A filter is considered highly selective if it eliminates a very large percentage of rows (e.g., more than 95%). In this +situation, the hinted strategy can be faster because: -As a rule of thumb, the binary search strategy tends to outperform the default strategy when the filter eliminates less -than 5% of rows from the joined table. However, optimal performance also depends on other factors such as the ratio -between main and joined table sizes, available hardware resources, disk I/O performance, and data distribution. +- The parallel pre-filtering step rapidly reduces the joined table to a very small size. +- The subsequent join operation is then very fast. -In contrast, the default strategy processes and filters the joined table in parallel, which can be much faster for -highly selective filters despite requiring an initial full table scan. +Conversely, the default binary search can be slower with highly selective filters because its single-threaded backward +scan may have to check many rows before finding one that satisfies the filter condition. -#### Execution Plan Observation -You can verify how QuestDB executes your query by examining its execution plan -with the [`EXPLAIN` statement](/reference/sql/explain/): +For most other cases, especially with filters that have low selectivity or when the joined table data is not in +memory ("cold"), the default binary search is significantly faster as it minimizes I/O operations. -```questdb-sql title="Observing execution plan with USE_ASOF_BINARY_SEARCH" -EXPLAIN SELECT /*+ USE_ASOF_BINARY_SEARCH(orders md) */ +----- + +### Execution Plan Observation + +You can verify how QuestDB executes your query by examining its execution plan with the `EXPLAIN` statement. + +#### Default Execution Plan (Binary Search) + +Without any hints, a filtered `ASOF JOIN` will use the binary search strategy. + +```questdb-sql title="Observing the default execution plan" +EXPLAIN SELECT orders.ts, orders.price, md.md_ts, md.bid, md.ask FROM orders ASOF JOIN ( @@ -103,18 +115,20 @@ ASOF JOIN ( ) md; ``` -When the hint is applied successfully, the execution plan will show a Filtered AsOf Join Fast Scan operator, -confirming that the binary search strategy is being used: +The execution plan will show a `Filtered AsOf Join Fast Scan` operator, confirming the binary search strategy is being +used. -For comparison, here's what happens without the hint: +#### Hinted Execution Plan (Full Scan) + +When you use the `AVOID_ASOF_BINARY_SEARCH` hint, the plan changes. -```questdb-sql title="Observing execution plan without USE_ASOF_BINARY_SEARCH" -EXPLAIN SELECT +```questdb-sql title="Observing execution plan with the AVOID hint" +EXPLAIN SELECT /*+ AVOID_ASOF_BINARY_SEARCH(orders md) */ orders.ts, orders.price, md.md_ts, md.bid, md.ask FROM orders ASOF JOIN ( @@ -123,12 +137,10 @@ ASOF JOIN ( ) md; ``` -The execution plan will show: - -- A standard `AsOf Join` operator instead of `Filtered AsOf Join Fast Scan` -- A separate filtering step that processes the joined table in parallel first +The execution plan will now show a standard `AsOf Join` operator and a separate, preceding filtering step on the joined +table. \ No newline at end of file diff --git a/documentation/reference/sql/asof-join.md b/documentation/reference/sql/asof-join.md index e7ddade9..01b3111d 100644 --- a/documentation/reference/sql/asof-join.md +++ b/documentation/reference/sql/asof-join.md @@ -34,14 +34,14 @@ Visualized, a JOIN operation looks like this: for more information. - `joinClause` `ASOF JOIN` with an optional `ON` clause which allows only the - `=` predicate: + `=` predicate and an optional `TOLERANCE` clause: - ![Flow chart showing the syntax of the ASOF, LT, and SPLICE JOIN keyword](/images/docs/diagrams/AsofLtSpliceJoin.svg) + ![Flow chart showing the syntax of the ASOF, LT, and SPLICE JOIN keyword](/images/docs/diagrams/AsofJoin.svg) - `whereClause` - see the [WHERE](/docs/reference/sql/where/) reference docs for more information. -In addition, the following are items of import: +In addition, the following are items of importance: - Columns from joined tables are combined in a single row. @@ -67,6 +67,9 @@ logic: for each row in the first time-series, 1. consider all timestamps in the second time-series **earlier or equal to** the first one 2. choose **the latest** such timestamp +3. If the optional `TOLERANCE` clause is specified, an additional condition applies: + the chosen record from t2 must satisfy `t1.ts - t2.ts <= tolerance_value`. If no record + from t2 meets this condition (along with `t2.ts <= t1.ts`), then the row from t1 will not have a match. ### Example @@ -151,7 +154,7 @@ Let's use an example with two tables: We want to join each trade event to the relevant order book snapshot. All we have to write is -```questdb-sql title="A basic ASOF JOIN example" demo +```questdb-sql title="A basic ASOF JOIN example" trades ASOF JOIN order_book ``` @@ -327,6 +330,77 @@ Result: +### TOLERANCE clause + +The `TOLERANCE` clause enhances ASOF and LT JOINs by limiting how far back in time the join should look for a match in the right +table. The `TOLERANCE` parameter accepts a time interval value (e.g., 2s, 100ms, 1d). + +When specified, a record from the left table t1 at t1.ts will only be joined with a record from the right table t2 at +t2.ts if both conditions are met: `t2.ts <= t1.ts` and `t1.ts - t2.ts <= tolerance_value` + +This ensures that the matched record from the right table is not only the latest one on or before t1.ts, but also within +the specified time window. + +```questdb-sql title="ASOF JOIN with a TOLERANCE parameter" +SELECT ... +FROM table1 +ASOF JOIN table2 TOLERANCE 10s +[WHERE ...] +``` + +TOLERANCE also works together with the ON clause: +```questdb-sql title="ASOF JOIN with keys and a TOLERANCE parameter" +SELECT ... +FROM table1 +ASOF JOIN table2 ON (key_column) TOLERANCE 1m +[WHERE ...] +``` + +The interval_literal must be a valid QuestDB interval string, like '5s' (5 seconds), '100ms' (100 milliseconds), '2m' ( +2 minutes), '3h' (3 hours), or '1d' (1 day). + + +#### Example using TOLERANCE: + +Consider the `trades` and `order_book` tables from the previous examples. If we want to join trades to order book snapshots +that occurred no more than 1 second before the trade: + +```questdb-sql title="TOLERANCE example" +SELECT t.timestamp, t.price, t.size, ob.timestamp AS ob_ts, ob.bid_price, ob.bid_size +FROM trades t +ASOF JOIN order_book ob TOLERANCE 1s; +``` + +Let's analyze a specific trade: trades at `08:00:01.146931`. +Without `TOLERANCE`, it joins with `order_book` at `08:00:01`. The time difference is 0.146931s. +If we set `TOLERANCE` '100ms', this trade would not find a match, because 0.146931s (146.931ms) is greater than 100ms. The +previous `order_book` entry at `08:00:00` would be even further away (1.146931s). + +Another trade: trades at `08:00:00.007140`. +Without `TOLERANCE`, it joins with order_book at `08:00:00`. The time difference is 0.007140s (7.14ms). +If we set `TOLERANCE` '5ms', this trade would not find a match because 7.14ms > 5ms. + +#### Supported Units for interval_literal +The `TOLERANCE` interval literal supports the following time unit qualifiers: +- U: Microseconds +- T: Milliseconds +- s: Seconds +- m: Minutes +- h: Hours +- d: Days +- w: Weeks + +For example, '100U' is 100 microseconds, '50T' is 50 milliseconds, '2s' is 2 seconds, '30m' is 30 minutes, +'1h' is 1 hour, '7d' is 7 days, and '2w' is 2 weeks. Please note that months (M) and years (Y) are not supported as +units for the `TOLERANCE` clause. + +#### Performance impact of TOLERANCE + +Specifying `TOLERANCE` can also improve performance. `ASOF JOIN` execution plans often scan backward in time on the right +table to find a matching entry for each left-table row. `TOLERANCE` allows these scans to terminate early - once a +right-table record is older than the left-table record by more than the specified tolerance - thus avoiding unnecessary +processing of more distant records. + ### Timestamp considerations `ASOF` join can be performed only on tables or result sets that are ordered by diff --git a/documentation/reference/sql/join.md b/documentation/reference/sql/join.md index 4edf188b..e63d89e9 100644 --- a/documentation/reference/sql/join.md +++ b/documentation/reference/sql/join.md @@ -33,7 +33,8 @@ High-level overview: ![Flow chart showing the syntax of the INNER, LEFT JOIN keyword](/images/docs/diagrams/InnerLeftJoin.svg) - `ASOF`, `LT`, and `SPLICE` `JOIN` has optional `ON` clause allowing only the - `=` predicate: + `=` predicate. + - `ASOF` and `LT` join additionally allows an optional `TOLERANCE` clause: ![Flow chart showing the syntax of the ASOF, LT, and SPLICE JOIN keyword](/images/docs/diagrams/AsofLtSpliceJoin.svg) @@ -311,7 +312,7 @@ WHERE t.timestamp < t2.timestamp ## LT JOIN -Similar to `ASOF JOIN`, `LT JOIN` joins two different time-series measured. For +Similar to [`ASOF JOIN`](/docs/reference/sql/asof-join/), `LT JOIN` joins two different time-series measured. For each row in the first time-series, the `LT JOIN` takes from the second time-series a timestamp that meets both of the following criteria: @@ -394,6 +395,42 @@ order to get preceding values for every row. The `ON` clause can also be used in combination with `LT JOIN` to join both by timestamp and column values. +### TOLERANCE clause +The `TOLERANCE` clause enhances LT JOIN by limiting how far back in time the join should look for a match in the right +table. The `TOLERANCE` parameter accepts a time interval value (e.g., 2s, 100ms, 1d). + +When specified, a record from the left table t1 at t1.ts will only be joined with a record from the right table t2 at +t2.ts if both conditions are met: `t2.ts < t1.ts` and `t1.ts - t2.ts <= tolerance_value` + +This ensures that the matched record from the right table is not only the latest one on or before t1.ts, but also within +the specified time window. + +```questdb-sql title="LT JOIN with a TOLERANCE parameter" +SELECT ... +FROM table1 +LT JOIN table2 TOLERANCE 10s +[WHERE ...] +``` + +The interval_literal must be a valid QuestDB interval string, like '5s' (5 seconds), '100ms' (100 milliseconds), +'2m' ( 2 minutes), '3h' (3 hours), or '1d' (1 day). + +#### Supported Units for interval_literal +The `TOLERANCE` interval literal supports the following time unit qualifiers: +- U: Microseconds +- T: Milliseconds +- s: Seconds +- m: Minutes +- h: Hours +- d: Days +- w: Weeks + +For example, '100U' is 100 microseconds, '50T' is 50 milliseconds, '2s' is 2 seconds, '30m' is 30 minutes, +'1h' is 1 hour, '7d' is 7 days, and '2w' is 2 weeks. Please note that months (M) and years (Y) are not supported as +units for the `TOLERANCE` clause. + +See [`ASOF JOIN documentation`](/docs/reference/sql/asof-join#tolerance-clause) for more examples with the `TOLERANCE` clause. + ## SPLICE JOIN `SPLICE JOIN` is a full `ASOF JOIN`. It will return all the records from both diff --git a/static/images/docs/diagrams/.railroad b/static/images/docs/diagrams/.railroad index e88bae5b..b3683340 100644 --- a/static/images/docs/diagrams/.railroad +++ b/static/images/docs/diagrams/.railroad @@ -238,8 +238,13 @@ joinOverview innerLeftJoin ::= ( 'INNER' | 'LEFT' )? 'JOIN' ( table | '(' sub-query ')' ) ( 'ON' ( ( column operator anotherColumn ) ( 'AND' ( column operator anotherColumn ) )* | '(' column ( ',' column )* ')' ) )? -aAsofLtSpliceJoin - ::= ( 'ASOF' | 'LT' | 'SPLICE' ) 'JOIN' ( table | '(' sub-query ')' ) ( 'ON' ( column '=' anotherColumn ( 'AND' column '=' anotherColumn )* | '(' column ( ',' column )* ')' ) )? +AsofLtSpliceJoin + ::= ( 'ASOF' | 'LT' ) 'JOIN' ( table | '(' sub-query ')' ) ( 'ON' ( column '=' anotherColumn ( 'AND' column '=' anotherColumn )* | '(' column ( ',' column )* ')' ) )? ( 'TOLERANCE' interval_literal )? + | 'SPLICE' 'JOIN' ( table | '(' sub-query ')' ) ( 'ON' ( column '=' anotherColumn ( 'AND' column '=' anotherColumn )* | '(' column ( ',' column )* ')' ) )? + +AsofJoin + ::= 'ASOF' 'JOIN' ( table | '(' sub-query ')' ) ( 'ON' ( column '=' anotherColumn ( 'AND' column '=' anotherColumn )* | '(' column ( ',' column )* ')' ) )? ( 'TOLERANCE' interval_literal )? + crossJoin ::= 'CROSS' 'JOIN' ( table | '(' sub-query ')' ) diff --git a/static/images/docs/diagrams/AsofJoin.svg b/static/images/docs/diagrams/AsofJoin.svg new file mode 100644 index 00000000..279969d3 --- /dev/null +++ b/static/images/docs/diagrams/AsofJoin.svg @@ -0,0 +1,81 @@ + + + + + + + + + ASOF + + + JOIN + + + table + + ( + + + sub-query + + ) + + + ON + + + column + + = + + + anotherColumn + + AND + + + ( + + + column + + , + + + ) + + + TOLERANCE + + + interval_literal + + + \ No newline at end of file diff --git a/static/images/docs/diagrams/AsofLtSpliceJoin.svg b/static/images/docs/diagrams/AsofLtSpliceJoin.svg index 84ca809e..1b91bbe2 100644 --- a/static/images/docs/diagrams/AsofLtSpliceJoin.svg +++ b/static/images/docs/diagrams/AsofLtSpliceJoin.svg @@ -1,81 +1,124 @@ - - - - - - - - - ASOF - - - LT - - - SPLICE - - - JOIN - - - table - - ( - - - sub-query - - ) - - - ON - - - column - - = - - - anotherColumn - - AND - - - ( - - - column - - , - - - ) - - - \ No newline at end of file + + + + + + + + + ASOF + + + LT + + + JOIN + + + table + + ( + + + sub-query + + ) + + + ON + + + column + + = + + + anotherColumn + + AND + + + ( + + + column + + , + + + ) + + + TOLERANCE + + + interval_literal + + SPLICE + + + JOIN + + + table + + ( + + + sub-query + + ) + + + ON + + + column + + = + + + anotherColumn + + AND + + + ( + + + column + + , + + + ) + + + + \ No newline at end of file