Skip to content

[EPIC] Support limit pruning #18860

@xudong963

Description

@xudong963

Let's see an example:

SELECT * FROM t1 WHERE ticker = 'ABT' AND window_start <= (1761364799999999999 AT TIME ZONE 'America/New_York') AND window_start >= (1698465600000000000 AT TIME ZONE 'America/New_York')  LIMIT 1

After the kinds of pruning inside DF, we can get some matched row groups.

Assume that there are still four row groups remaining after row group pruning by ticker = 'ABT' AND window_start <= (1761364799999999999 AT TIME ZONE 'America/New_York') AND window_start >= (1698465600000000000 AT TIME ZONE 'America/New_York') and related statistics in row group metadata. And one (row group 3) of the four is fully matched with the predicates, and the others are not, which are only partially matched.

The worst case is that we need to scan rg1, rg2, then find a matched row in rg3. But suppose we know the rg3 is fully matched (currently, DF only supports partial-matched). In that case, we can only check it if the total row count across all the rg3 meets or exceeds the threshold defined by k in the LIMIT clause, then we definitely can reduce the scanned data and skip other rows.

If there are no fully matched rg, we can extend limit pruning to the page level, to find the fully matched pages.


I plan to support row group level limit pruning first, then explore the page level

Sub-issues

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions