Skip to content

[Enhancement] Route range-distribution OLAP tables by per-index distribution expressions (backport #74753)#75013

Merged
wanpengfei-git merged 2 commits into
branch-4.1from
mergify/bp/branch-4.1/pr-74753
Jun 18, 2026
Merged

[Enhancement] Route range-distribution OLAP tables by per-index distribution expressions (backport #74753)#75013
wanpengfei-git merged 2 commits into
branch-4.1from
mergify/bp/branch-4.1/pr-74753

Conversation

@mergify

@mergify mergify Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Why I'm doing:

Range-distribution (shared-data) tables route rows to tablets by per-tablet boundaries stored in sort-key space, but the OLAP table sink could only carry a single partition-level distribution-column set. It therefore could not route different rows to materialized indexes that live in different key spaces. This is the missing sink piece for two future features — the K-tablet shadow-index rewrite job (key-column schema change) and range-distribution rollup — both of which need a base index and a new-key index to coexist in one partition and be routed independently.

What I'm doing:

Add per-index distribution routing to the sink:

  • thrift: new TOlapTableIndexSchema.distributed_exprs (field 9) carrying per-index routing expression trees, evaluated at the sink sender. Sender-only: POlapTableIndexSchema (proto) is unchanged, so remote write channels never route by it.
  • FE: OlapTableSink.createSchema fills distributed_exprs for range-distribution tables with slot-refs over each index's range sort-key columns, gated to the OLAP write-sink path (dictionary / non-write callers do not emit it). For today's base-only range tables this resolves to exactly the columns the partition-level path already used, so routing is behavior-preserving. Also adds an optional targetWriteIndexId filter (write only one index; schema, partition and loaded-index lists stay 1:1 by meta id).
  • BE: OlapTableSchemaParam parses distributed_exprs into per-index ExprContexts (prepare/open/close lifecycle); the range sink sender evaluates them once per chunk per index and routes via RangeRouter. RangeRouter::init validates routing-key types against the boundary types; a new route_chunk_rows overload routes from pre-evaluated columns; an empty distributed_exprs (K=1) routes to the single tablet. When an index has no distributed_exprs, routing falls back to the partition-level path unchanged.

No version gate is needed: StarRocks upgrades BE/CN before FE, so a newly-upgraded FE (the only one that emits the field) never runs against a BE that does not understand it, and an old FE never emits it. Non-range tables and any unset field are byte-for-byte unchanged. This is prerequisite-only: the new capability is dormant for existing tables and consumed by future work.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5

🤖 Generated with Claude Code


This is an automatic backport of pull request #74753 done by Mergify.

@mergify mergify Bot added the conflicts label Jun 18, 2026
@mergify

mergify Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Cherry-pick of 60a751b has failed:

On branch mergify/bp/branch-4.1/pr-74753
Your branch is up to date with 'origin/branch-4.1'.

You are currently cherry-picking commit 60a751bdaa.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   be/src/exec/range_router.cpp
	modified:   be/src/exec/range_router.h
	modified:   be/src/exec/range_tablet_sink_sender.cpp
	modified:   be/src/exec/range_tablet_sink_sender.h
	modified:   be/src/exec/tablet_info.h
	modified:   be/test/exec/range_router_test.cpp
	modified:   be/test/exec/tablet_info_test.cpp
	modified:   fe/fe-core/src/main/java/com/starrocks/planner/DictionaryCacheSink.java
	modified:   fe/fe-core/src/main/java/com/starrocks/planner/OlapTableSink.java
	modified:   fe/fe-core/src/main/java/com/starrocks/service/FrontendServiceImpl.java
	modified:   fe/fe-core/src/test/java/com/starrocks/planner/OlapTableSinkTest.java
	modified:   fe/fe-core/src/test/java/com/starrocks/planner/OlapTableSinkTest2.java
	modified:   gensrc/thrift/Descriptors.thrift

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   be/src/exec/tablet_info.cpp
	both modified:   be/test/exec/tablet_sink_sender_range_test.cpp

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@wanpengfei-git wanpengfei-git enabled auto-merge (squash) June 18, 2026 10:07
@mergify mergify Bot closed this Jun 18, 2026
auto-merge was automatically disabled June 18, 2026 10:07

Pull request was closed

@mergify

mergify Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr

@xiangguangyxg xiangguangyxg reopened this Jun 18, 2026
@wanpengfei-git wanpengfei-git enabled auto-merge (squash) June 18, 2026 11:27
@xiangguangyxg xiangguangyxg force-pushed the mergify/bp/branch-4.1/pr-74753 branch from b122aae to 5463a84 Compare June 18, 2026 11:43
@xiangguangyxg

Copy link
Copy Markdown
Contributor

Resolved the backport conflict and force-pushed the clean cherry-pick (5463a84).

Cause: branch-4.1 predates two main-only refactors that the cherry-pick of 60a751b collided with:

  1. be/src/exec/tablet_info.cpp — main renamed the static expr lifecycle helpers Expr::prepare/open/closeExprExecutor::prepare/open/close. 4.1 still uses Expr::. The P1a hunk that adds the per-index distributed_expr_ctxs prepare/open/close loop landed right next to those renamed calls, so the context conflicted.
  2. be/test/exec/tablet_sink_sender_range_test.cpp — on main this test lives under be/test/exec/data_sinks/ and uses the post-reorg header paths (types/type_descriptor.h, types/datum.h, base/testutil/assert.h). 4.1 has it flat under be/test/exec/ with the older paths (runtime/types.h, column/datum.h, testutil/assert.h), producing a rename + include conflict.

Resolution (the only two changed regions, verified against b122aae):

  • tablet_info.cpp: kept 4.1's Expr::prepare/open/close API and added the P1a per-index loop on top — no ExprExecutor:: references remain.
  • test includes: mapped to 4.1's header layout (runtime/types.h for TypeDescriptor, column/datum.h already present for Datum); added gen_cpp/descriptors.pb.h, runtime/descriptor_helper.h, runtime/exec_env.h, runtime/runtime_state.h. The test body is byte-identical to main.

Confirmed all P1a dependencies exist on 4.1 (MetaUtils.getRangeDistributionColumns(table, indexMetaId), ExprToThrift.treesToThrift, OlapTable.isRangeDistribution, getLatestMaterializedIndices(IndexExtState)) and that TOlapTableIndexSchema thrift field 9 is free on 4.1. Leaving the rest of the cherry-pick to CI for the full BE/FE compile.

…ibution expressions (#74753)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: wanpengfei-git <wanpengfei91@163.com>
(cherry picked from commit 60a751b)
@xiangguangyxg xiangguangyxg force-pushed the mergify/bp/branch-4.1/pr-74753 branch from 5463a84 to 3a3b681 Compare June 18, 2026 12:53
@wanpengfei-git wanpengfei-git merged commit 93022df into branch-4.1 Jun 18, 2026
29 of 30 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-4.1/pr-74753 branch June 18, 2026 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants