[Enhancement] Route range-distribution OLAP tables by per-index distribution expressions#74753
Conversation
…ibution expressions Range-distribution tables route rows to tablets by per-tablet boundaries in sort-key space, but the sink could only carry one partition-level distribution-column set, so it could not route different rows to indexes that live in different key spaces. This blocks the future K-tablet shadow-index rewrite job and range-distribution rollup, both of which need a base index and a new-key index to coexist in one partition. Add per-index distribution routing: - thrift: TOlapTableIndexSchema.distributed_exprs (field 9) carries per-index routing expression trees, evaluated at the sink sender. Sender-only: POlapTableIndexSchema is unchanged, so remote write channels never route by it. - FE: OlapTableSink.createSchema fills distributed_exprs for range-distribution tables (slot-refs over each index's range sort-key columns), gated to the OLAP write-sink path (dictionary/non-write callers pass emitDistributedExprs=false). For today's base-only range tables this resolves to the same columns the partition-level path used, so routing is behavior-preserving. Adds an optional targetWriteIndexId filter (write only one index; schema, partition, and loaded-index lists stay 1:1 by meta id). - BE: OlapTableSchemaParam parses distributed_exprs into per-index ExprContexts (prepare/open/close lifecycle); the range sink sender evaluates them once per chunk per index and routes via RangeRouter. RangeRouter::init validates the routing-key types against the boundary types; a new route_chunk_rows overload routes from pre-evaluated columns; an empty distributed_exprs (K=1) routes to the single tablet. When an index has no distributed_exprs, routing falls back to the partition-level path unchanged. No version gate is needed: StarRocks upgrades BE/CN before FE, so a new FE never runs against a BE that does not understand the field, and an old FE never emits it. Non-range tables and any unset field are byte-for-byte unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Nice work! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 42 / 45 (93.33%) file detail
|
[BE Incremental Coverage Report]✅ pass : 75 / 80 (93.75%) file detail
|
Real-cluster E2E validation (shared-data)Built this PR on our internal test platform and validated on a fresh shared-data cluster (1 FE + 3 CN, this PR's build). What's validated: the per-index
Each "post-split write" exercises the new sink per-index routing into multiple range tablets. Example (PK): table split at boundaries Range-distribution routing is behavior-preserving for base-only tables; these results confirm loads / queries / splits stay correct with the new per-index routing path active (no regression). Guard sanity (existing range sort-key guards still hold on this build):
Unit tests: FE |
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
|
@Mergifyio backport branch-4.1 |
✅ Backports have been createdDetails
Cherry-pick of 60a751b has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
Why I'm doing:
Range-distribution (shared-data) tables route rows to tablets by per-tablet boundaries stored in sort-key space, but the OLAP table sink could only carry a single partition-level distribution-column set. It therefore could not route different rows to materialized indexes that live in different key spaces. This is the missing sink piece for two future features — the K-tablet shadow-index rewrite job (key-column schema change) and range-distribution rollup — both of which need a base index and a new-key index to coexist in one partition and be routed independently.
What I'm doing:
Add per-index distribution routing to the sink:
TOlapTableIndexSchema.distributed_exprs(field 9) carrying per-index routing expression trees, evaluated at the sink sender. Sender-only:POlapTableIndexSchema(proto) is unchanged, so remote write channels never route by it.OlapTableSink.createSchemafillsdistributed_exprsfor range-distribution tables with slot-refs over each index's range sort-key columns, gated to the OLAP write-sink path (dictionary / non-write callers do not emit it). For today's base-only range tables this resolves to exactly the columns the partition-level path already used, so routing is behavior-preserving. Also adds an optionaltargetWriteIndexIdfilter (write only one index; schema, partition and loaded-index lists stay 1:1 by meta id).OlapTableSchemaParamparsesdistributed_exprsinto per-indexExprContexts (prepare/open/close lifecycle); the range sink sender evaluates them once per chunk per index and routes viaRangeRouter.RangeRouter::initvalidates routing-key types against the boundary types; a newroute_chunk_rowsoverload routes from pre-evaluated columns; an emptydistributed_exprs(K=1) routes to the single tablet. When an index has nodistributed_exprs, routing falls back to the partition-level path unchanged.No version gate is needed: StarRocks upgrades BE/CN before FE, so a newly-upgraded FE (the only one that emits the field) never runs against a BE that does not understand it, and an old FE never emits it. Non-range tables and any unset field are byte-for-byte unchanged. This is prerequisite-only: the new capability is dormant for existing tables and consumed by future work.
What type of PR is this:
Does this PR entail a change in behavior?
Checklist:
Bugfix cherry-pick branch check:
🤖 Generated with Claude Code