-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Let's see an example:
SELECT * FROM t1 WHERE ticker = 'ABT' AND window_start <= (1761364799999999999 AT TIME ZONE 'America/New_York') AND window_start >= (1698465600000000000 AT TIME ZONE 'America/New_York') LIMIT 1
After the kinds of pruning inside DF, we can get some matched row groups.
Assume that there are still four row groups remaining after row group pruning by ticker = 'ABT' AND window_start <= (1761364799999999999 AT TIME ZONE 'America/New_York') AND window_start >= (1698465600000000000 AT TIME ZONE 'America/New_York') and related statistics in row group metadata. And one (row group 3) of the four is fully matched with the predicates, and the others are not, which are only partially matched.
The worst case is that we need to scan rg1, rg2, then find a matched row in rg3. But suppose we know the rg3 is fully matched (currently, DF only supports partial-matched). In that case, we can only check it if the total row count across all the rg3 meets or exceeds the threshold defined by k in the LIMIT clause, then we definitely can reduce the scanned data and skip other rows.
If there are no fully matched rg, we can extend limit pruning to the page level, to find the fully matched pages.
I plan to support row group level limit pruning first, then explore the page level