Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storages: support build inverted index query info from RSOperator #9998

Merged
merged 10 commits into from
Mar 21, 2025

Conversation

Lloyd-Pottiger
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger commented Mar 14, 2025

What problem does this PR solve?

Issue Number: ref #9843

Problem Summary:

What is changed and how it works?

Part 2 of inverted index, introduce `IntegerSet` and `ColumnRange`:

`IntegerSet` is a collection of discrete integer values, like `{0, 1, 8}`, `[10, 11)` and `{0, 1, 8} U [10, 11)`.

`ColumnRange` represents the range of values for columns, the range of values of a column is represented by `ColumnID` and `IntegerSet`. It supports all types of algebra of sets.

And we support convert `RSOperator` to `ColumnRange`.

For example, `RSOperator` a > 0 and a < 10 and b < 10 or c > 0` will be converted to `col_id_a: [1, 9] && col_id_b: [-inf, 9] || col_id_c: [0, inf]`

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 14, 2025
Comment on lines +57 to +58
virtual BitmapFilterPtr check(std::function<BitmapFilterPtr(const ColumnRangePtr &, size_t)> search, size_t size)
= 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comments about the param of the function check and its params

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this in Next PR #10006.

@JaySon-Huang
Copy link
Contributor

Some refine Lloyd-Pottiger#23

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 20, 2025
Comment on lines +65 to +72
if (auto set = IntegerSet::createValueSet(attr.type->getTypeId(), values); set)
{
auto iter = std::find_if(index_info->begin(), index_info->end(), [&](const auto & info) {
return info.column_id == attr.col_id && info.kind == TiDB::ColumnarIndexKind::Inverted;
});
if (iter != index_info->end())
return SingleColumnRange::create(iter->column_id, iter->index_id, set);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user write thousands or more values in the value list of in (...). Would it bring OOM issue or performance regression? I think this can be an issue that you should verify later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For OOM, in this case, the bottleneck would not be the ColumnRange.

For performance, we will decide whether to use inverted index in optimizer based on the selectivity.

Lloyd-Pottiger and others added 9 commits March 21, 2025 16:58
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: JaySon-Huang <[email protected]>
Co-authored-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Copy link
Contributor

ti-chi-bot bot commented Mar 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, JinheLin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,JinheLin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 21, 2025
Copy link
Contributor

ti-chi-bot bot commented Mar 21, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-03-20 04:49:43.540821121 +0000 UTC m=+503877.225057218: ☑️ agreed by JaySon-Huang.
  • 2025-03-21 09:13:28.260439604 +0000 UTC m=+606101.944675700: ☑️ agreed by JinheLin.

Signed-off-by: Lloyd-Pottiger <[email protected]>
@ti-chi-bot ti-chi-bot bot merged commit 400923b into pingcap:master Mar 21, 2025
5 checks passed
@Lloyd-Pottiger Lloyd-Pottiger deleted the parse-query-info branch March 21, 2025 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants