feat: per-shard sharded Strata applies (Plan-stage fan-out + per-shard DDL)#535
Merged
morgo merged 13 commits intoJun 26, 2026
Merged
Conversation
A sharded engine (e.g. GAP's Strata engine) requires exactly one target shard per work operation. The control plane already fans a sharded plan out into one apply_operation per shard (buildShardedApplyOperationGroups) and dispatches each scoped to its shard, but only when the apply client advertises ShardedApplyFanoutSupport — which GRPCClient did not, so a sharded plan dispatched to a gRPC data plane was sent whole and rejected with "expected exactly one target shard, got 0". Fan-out belongs on the control plane: it owns rollout sequencing (canaries, per-shard ordering) and per-shard drift reconciliation, which a data plane that fans out for itself cannot do. So: - GRPCClient.SupportsShardedApplyFanout reports true. The capability is unconditional and safe: apply-create only fans out when the stored plan carries changing shards, which non-sharded engines never produce, so a non-sharded plan is dispatched whole exactly as before. - LocalClient.Apply honors the per-shard scoped dispatch: it builds tasks from the operation's own req.DdlChanges (falling back to the whole plan for a local apply) and tags them with req.TargetShards when the dispatch targets a single shard, so the engine receives exactly one target shard. No data-plane fan-out and no operator claim-mode change — the data plane keeps its inline, apply-level drive, one shard per dispatch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR moves sharded-apply fan-out to the control plane for gRPC data-plane executes, by having the gRPC client advertise sharded fan-out support and updating the data-plane LocalClient.Apply path to honor per-operation scoped dispatch (DDL subset + single target shard).
Changes:
- Add
GRPCClient.SupportsShardedApplyFanout()so control-plane apply creation can fan sharded plans out into per-shard operations when using gRPC. - Update
LocalClient.Applyto build tasks fromApplyRequest.DdlChangeswhen present (scoped dispatch), and tag tasks with a single dispatched shard fromApplyRequest.TargetShards. - Add focused tests for the new capability advertisement and the scoped-DDL vs plan-fallback behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/tern/sharded_apply_dispatch_test.go | Adds tests asserting gRPC fan-out capability and that LocalClient honors dispatched DDL scope. |
| pkg/tern/local_client.go | Builds tasks from dispatched DdlChanges and tags tasks with a single dispatched shard. |
| pkg/tern/grpc_client.go | Advertises sharded-apply fan-out capability for the gRPC client. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Plan is where sharding is declarative: a keyspace's shards won't always need the same change (drift, or a partially-applied canary rollout). Previously the plan collapsed every changing shard to one namespace-level DDL set and the fan-out applied that uniform DDL to each shard. Carry each shard's own changes end to end instead. - proto: ShardPlan gains `repeated TableChange changes`, so per-shard DDL crosses the gRPC boundary to the control plane that fans out. - storage.ShardPlan gains `Changes []TableChange` (persisted in plan_data JSON; empty means "use the namespace-level changes", so old plans and the uniform case are unchanged). - LocalClient.Plan records each per-shard SchemaChange's table changes onto its ShardPlan (storage + response); protoShardPlansToStorage reads them back. - buildShardedApplyOperationGroups builds each shard's tasks from that shard's own changes, falling back to the namespace-level changes when a shard carries none. The data-plane apply path already drives each operation from its own DdlChanges, so divergent per-shard DDL flows through to execution unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
LocalClient.Apply honored req.DdlChanges for every dispatch, which changed the non-sharded path (it had used the stored plan) and regressed TestK8s_Progress: a non-sharded mysql apply reported the wrong table because it built tasks from the request rather than the plan. Gate the per-shard behavior on the dispatch actually being sharded: only when the request carries target shards does Apply drive the operation's own DdlChanges and tag tasks with the shard. A whole-deployment or non-sharded apply uses the stored plan unchanged — byte-for-byte the prior behavior. A dispatch with more than one target shard is malformed (the fan-out emits one shard per operation) and now fails closed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nges ShardPlan.needs_change was a separate membership flag carried over from the pre-per-shard design, where the plan recorded that a shard needed a change but not the change itself — a plan-time/apply-time TOCTOU (the saved flag could disagree with a re-derived DDL at apply time). Now that each shard carries its own changes, membership is implied: a shard is changing iff len(changes) > 0. - proto: ShardPlan drops needs_change (field 3 reserved). - storage.ShardPlan drops NeedsChange. - changingShardsByNamespace filters on len(Changes) > 0. - buildShardedApplyOperationGroups drops the now-unreachable namespace-level fallback (every changing shard carries its own changes) and its taskChanges parameter. - LocalClient.Plan / protoShardPlansToStorage stop setting NeedsChange. A shard with no changes simply produces no operation. Tests updated to mark changing shards by their changes rather than a flag. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-control-plane-fanout
…nalizers over gRPC Addresses the review feedback on the per-shard apply work by removing redundant machinery and closing the one real gap. No existing users depend on the removed paths, so this refactors freely rather than preserving compatibility shims. 1. Remove the materialize-time drift guard verifyMaterializedPlanMatchesLiveSchema re-planned against live schema and fanned closed on any difference, but only on the non-primary materialize path — the primary path already trusts its stored plan and reconciles via replanAndFilterTasks at apply/resume. The dispatch carries the reviewed plan's DDL verbatim, so the materialized plan IS the reviewed plan; drift between review and apply is reconciled the same way on every path (replan-and-filter plus engine validation at execution). This also removes the special-case that rejected shard-scoped applies. 2. Gate fan-out on the plan, not a client capability Drop ShardedApplyFanoutSupport (+ GRPCClient/LocalClient impls and clientSupportsShardedApplyFanout). Fan-out is gated solely on canBuildShardedOperationGroups (plan.Shards). Only an instance-local sharded engine (Strata) emits per-shard ShardPlans, so an externally-authoritative engine (PlanetScale) — whose plans never carry per-shard changes — is never fanned out, regardless of transport. This removes the unconditional GRPCClient.SupportsShardedApplyFanout() the review flagged. 3. Dispatch task-less group_finalizer operations over gRPC GRPCClient.ResumeApplyOperation treated a finalizer's empty task set as an error, stranding VSchema-changing sharded applies on a remote data plane. It now dispatches the namespace VSchema as a VSchema-only apply (no DDL, no target shards, carrying the vschema.json), which the data plane applies via its task-less VSchema-only path — the remote counterpart to LocalClient.driveGroupFinalizer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two review fixes for the per-shard dispatch path: - protoShardPlansToStorage now returns an error on a nil entry in ShardPlan.changes instead of silently skipping it. It is data-plane input over gRPC, so a nil change is malformed and must not yield an incomplete plan. - A shard-scoped apply (TargetShards set) now requires valid, non-empty ddl_changes (scopedDispatchDDLChanges): non-nil entries with a non-empty table/DDL and a supported change type. It no longer falls back to the whole stored plan, which could have applied unrelated tables on the targeted shard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-control-plane-fanout
…alidation - Resume/recovery re-plan now keys remaining DDL by (shard, table) instead of table name alone. A sharded re-plan repeats the same table across shards, so table-only keying conflated per-shard tasks (one shard's diff could keep another shard's task active or overwrite its DDL). Non-sharded engines emit an empty shard name, so keying degrades to table-only and matches prior behavior. - protoShardPlansToStorage fails closed on remote/untrusted per-shard changes: nil entry, empty table/DDL, unsupported change type, or namespace mismatch. - buildShardedApplyOperationGroups validates each shard's changes before building operation keys / engine requests, rejecting empty or mismatched fields. - Tests for the per-shard key, non-sharded degradation, and both fail-closed paths. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…shard dispatch - A task-less group_finalizer in a multi-operation apply carries no task rows, so the operator's task-derived operation→parent projection can never move its operation row off pending: both terminal completion AND terminal failure were silently lost, stranding the operation and causing endless operator retries. The remote drive now reflects the remote terminal state onto its own operation row (persistFinalizerOperationTerminalState: MarkCompleted/MarkFailed), and a retryable failure is left non-terminal for re-drive — mirroring the local driveGroupFinalizer. Gated by scope.finalizerOperationScope(). - scopedDispatchDDLChanges now rejects CHANGE_TYPE_VSCHEMA: a shard-scoped dispatch carries only table DDL (create/alter/drop); a VSchema update is keyspace-wide and applied by the finalizer, never shard-tagged. - Extracted dispatchTargetShard, which trims and validates the single target shard and fails closed on an empty/whitespace shard (an invalid scope the engine would reject or mis-execute); consolidates the prior >1-shard check. - Tests: finalizer terminal completed/failed reflected onto the operation row; vschema rejected by scoped dispatch; dispatchTargetShard trim/validation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-control-plane-fanout # Conflicts: # pkg/proto/ternv1/tern.pb.go
- The resume/recovery re-plan key now includes namespace: shardTableKey is (namespace, shard, table) and replanShardTableDDL / tableStillNeedsChange / replanAndFilterTasks key by the full tuple. A plan is keyed by (Namespace, Shard), so in a multi-keyspace Vitess apply the same shard+table can exist in two namespaces; keying by (shard, table) alone would reconcile a task against another keyspace's diff. Non-sharded engines emit an empty shard, so it degrades to (namespace, table). - scopedDispatchDDLChanges and protoShardPlansToStorage now trim namespace/table/ DDL before validating and storing, instead of validating a trimmed copy but persisting the untrimmed value. These feed operation keys and task rows, so leaked surrounding whitespace caused surprising keys and reconciliation/progress mismatches. scopedDispatchDDLChanges also fails closed on an empty namespace (authoritative scope for a shard-scoped dispatch). - Tests: re-plan key distinguishes same shard+table across keyspaces; scoped dispatch trims values and rejects empty namespace; the mysqlstore plans round-trip now carries and asserts per-shard ShardPlan.Changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
aparajon
reviewed
Jun 26, 2026
aparajon
approved these changes
Jun 26, 2026
Per reviewer: this proto has no released consumers yet, so the needs_change removal can be a clean breaking change rather than carrying a reserved marker. Drop `reserved 3 / "needs_change"` and renumber `changes` 4 -> 3. Regenerated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
morgo
added a commit
to morgo/schemabot
that referenced
this pull request
Jun 26, 2026
The plan comment was keyspace-level: a divergent sharded plan (shards needing different changes — drift, or a partially-applied canary) collapsed to one DDL block, so an operator couldn't see what would apply where. The per-shard plan data block#535 persists never reached the comment — apitypes.PlanResponse carried only the namespace-level Changes. - apitypes.PlanResponse gains Shards ([]ShardPlanResponse), and planResponseFromProto carries them from the proto plan response. - KeyspaceChangeData gains per-shard Shards; buildPlanCommentData threads them. - writeKeyspaceChanges groups a sharded keyspace's shards by change signature: uniform shows the DDL once; divergent renders "what applies where" — one DDL block per distinct change set, labelled with the shards it applies to. No gap change needed: the per-shard data already flows from gap's Strata engine through the proto plan response; only the control-plane rendering changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
morgo
added a commit
that referenced
this pull request
Jun 26, 2026
…oup-by-variant) (#546) * feat(webhook): dedicated sharded-apply comment (per-shard, group-by-variant) A sharded apply fans out across the shards of one keyspace within a single deployment, but the comment renderer reused the multi-deployment layout, whose unit is the deployment. Every per-shard operation shares the one deployment ("cake"), so all shards rendered as identical "cake" rows, and — because the per-deployment Details map is keyed by deployment name — the per-shard detail collided and a failed shard's error was dropped (it showed a pending sibling's body instead). That is why a failed rollout surfaced no error. Add a dedicated sharded layout: - pkg/webhook/templates/sharded_apply.go: RenderShardedApplyComment. Shards are grouped by change signature — a uniform apply renders one Shard|Status table with the DDL once; divergent shards (different tables, or the same table computing to different DDL because shards drifted) render one group per distinct change set, each tying its shards' statuses to the DDL it runs. The first failed shard's error is lifted to the top and shown in its row. - pkg/webhook/sharded_apply.go: detect a sharded apply (work ops keyed namespace/shard/table, one deployment), and build the comment input from the operation rows + tasks. Per-shard status is derived through pkg/presentation with the shard name as the operation identity, so ordering labels reference shards ("waiting for -40", "halted — -40 failed") with no presentation change. formatApply{Status,Summary}Comment route sharded applies here. Finalizer (VSchema) operations are out of scope for the shard view for now; their outcome is still reflected in the aggregate headline state. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Address Copilot: handle failed_retryable in the sharded comment The sharded layout only recognised the terminal Failed state, so an apply that SchemaBot is auto-retrying (failed_retryable) lost its error and its next-action guidance. Mirror the single-deployment renderer: - isShardFailureState covers Failed and FailedRetryable; the first-failure callout and the per-shard status cell both surface the error for either. - writeShardedFooter adds the FailedRetryable case with the "stop retrying" command, matching writeApplyFooter. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(plan): show per-shard changes in the sharded plan comment The plan comment was keyspace-level: a divergent sharded plan (shards needing different changes — drift, or a partially-applied canary) collapsed to one DDL block, so an operator couldn't see what would apply where. The per-shard plan data #535 persists never reached the comment — apitypes.PlanResponse carried only the namespace-level Changes. - apitypes.PlanResponse gains Shards ([]ShardPlanResponse), and planResponseFromProto carries them from the proto plan response. - KeyspaceChangeData gains per-shard Shards; buildPlanCommentData threads them. - writeKeyspaceChanges groups a sharded keyspace's shards by change signature: uniform shows the DDL once; divergent renders "what applies where" — one DDL block per distinct change set, labelled with the shards it applies to. No gap change needed: the per-shard data already flows from gap's Strata engine through the proto plan response; only the control-plane rendering changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Address Copilot: join all task DDLs per shard cell buildShardedApplyData took only the first task's DDL for an operation. An operation can carry more than one task for its (namespace, shard, table) when a shard plan yields multiple statements for the same table, so this dropped statements and corrupted the change signature used to group shards. Join every non-empty task DDL in task order instead. Covered by TestBuildShardedApplyData_JoinsMultiTaskDDL. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Address Copilot: single-keyspace guard + count per-shard plan DDL - isShardedApply now also requires every shard work operation to share one namespace. A multi-keyspace apply (still one deployment) would otherwise route to the sharded layout and mislabel the keyspace / mix unrelated shard groups; it now falls back to the existing layout. - countChanges counts a keyspace's per-shard statements when the collapsed namespace-level Statements are absent (keyspaceStatementCount), so a sharded plan whose only DDL is per-shard is not miscounted as "no changes" and gets correct summary counts. - Tests for multi-keyspace fallback and per-shard-only plan counting. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(sharded): per-shard unsafe in plan; surface failed-shard error in apply Apply comment: a remote failure records its error on the operation's task, and the operator may not stamp it onto the operation row — so the sharded apply comment showed a failed shard with no error (forcing a trip to logs). aggregateShardState now falls back to the first task error when the operation row carries none, so a failed shard always shows why. Plan comment: an unsafe change confined to one shard (e.g. a column drop on a single drifted shard) is now flagged with that shard. shardedUnsafeChanges derives unsafe changes from the per-shard plan (grouped by table+reason → shards) when present, since the collapsed namespace-level Changes can omit a shard's change; the unsafe-changes warning names the shard(s). Tests: apply falls back to task error; plan surfaces a per-shard unsafe change scoped to its shard (both the builder and the rendered warning). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Address Copilot + add control-plane shard-scope guard Copilot fixes: - parseShardOperationKey rejects keys with extra segments (strict 3-part split), so "ns/-40/table/extra" is no longer misclassified as shard work. - buildShardedApplyData sorts each operation's tasks by id before joining DDL, so the joined DDL and the change signature are deterministic (tasks arrive created_at DESC). In practice an operation has one task — multiple statements for one table are combined into one ALTER upstream — so this is defensive. - planResponseFromProto drops shard plans with no changes, and buildPlanCommentData skips per-shard entries with no DDL, so an empty shard never renders a blank section or inflates the change count (a shard is changing iff it has changes). Control-plane fail-closed guard (requested): - dispatchPendingApply refuses to dispatch a shard work operation ("namespace/shard/table") that resolves no target shard, with a clear error naming the skew, instead of sending an empty TargetShards that the data plane rejects opaquely as "expected exactly one target shard, got 0". Also correct the per-shard unsafe test/preview: a same-table change is one combined ALTER, not multiple statements (multi-statement-per-table is unsupported). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(templates): add Sharded Apply previews to TEMPLATES.md The sharded apply/plan comments weren't represented in the generated TEMPLATES.md catalog. Add preview renderings (pkg/webhook/templates/ preview_sharded.go) for the divergent plan, the per-shard unsafe-change plan, and the in-progress / failed / divergent apply comments — per-shard statuses derived through pkg/presentation so they match production — wire a comment_sharded_all aggregate preview type, and add a "Sharded Apply" section to update-templates.sh. Regenerated TEMPLATES.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces #531 (which fanned out on the data plane). Sharding is a Plan / control-plane concern: Plan declares what each shard needs (which won't always be the same — drift, or a partially-applied canary), the control plane fans that into per-shard operations and sequences the rollout, and the data plane executes one shard per dispatch.
A sharded Strata apply previously failed on the data plane with
expected exactly one target shard, got 0, because a sharded plan dispatched over gRPC was sent whole. This PR makes the whole path work and simplifies the surrounding machinery.What changed
1. Plan persists per-shard DDL
ShardPlan.changescarries each shard's own table changes across the gRPC boundary;storage.ShardPlan.Changespersists them inplan_dataJSON.LocalClient.Planrecords each per-shardSchemaChange's changes;protoShardPlansToStoragereads them back. Because that per-shard data is remote/untrusted, bothprotoShardPlansToStorageandbuildShardedApplyOperationGroupsvalidate and fail closed before use — rejecting a nil/empty change, empty table or DDL, an unsupported change type, or a namespace that disagrees with the shard's namespace — rather than persisting or building from corrupt input.2.
needs_changeremoved — a shard is changing iff it hasChanges3. Fan-out gates on the plan, not a client capability
ShardedApplyFanoutSupport(+ both impls andclientSupportsShardedApplyFanout). Fan-out gates solely oncanBuildShardedOperationGroups(plan.Shards). Only the instance-local Strata engine emits per-shardShardPlans, so an externally-authoritative engine (PlanetScale) — whose plans never carry per-shard changes — is never fanned out, regardless of transport.4. Data plane honors the scoped per-shard dispatch
LocalClient.Applybuilds tasks from the operation's ownDdlChanges, tagged withTargetShards, so the engine gets exactly one target shard. A shard-scoped dispatch must carry valid, non-emptyddl_changes(scopedDispatchDDLChanges, fail closed) — and only table DDL (create/alter/drop): a VSchema change is keyspace-wide and rejected here. The single target shard is trimmed and validated (dispatchTargetShard), failing closed on >1 shard or an empty/whitespace shard. The non-sharded path is unchanged.5. Task-less VSchema
group_finalizerdispatched over gRPCGRPCClient.ResumeApplyOperationnow dispatches a finalizer op as a VSchema-only apply (no DDL, no target shards, carryingvschema.json) — the remote applies it via its task-less VSchema-only path — instead of stranding it onErrNoTasksForApplyOperation.persistFinalizerOperationTerminalState:MarkCompleted/MarkFailed), mirroring the localdriveGroupFinalizer; a retryable failure is left non-terminal for re-drive. Without this both terminal completion and failure were lost, stranding the operation in pending.6. Drift guard removed (simplification)
verifyMaterializedPlanMatchesLiveSchemaand its drift-multiset/vschema-parity helpers. It re-planned-and-compared on only the non-primary materialize path; the primary path already trusts its stored plan and reconciles viareplanAndFilterTasksat apply/resume. The dispatch carries the reviewed plan verbatim, so the materialized plan is the reviewed plan, and drift is reconciled uniformly on every path (replan-and-filter + engine validation). This also removed the special case that rejected shard-scoped applies.(shard, table)(replanShardTableDDL), since a sharded re-plan repeats the same table across shards. Keying by table name alone would conflate per-shard tasks; a non-sharded engine emits an empty shard name, so it degrades to table-only and matches prior behavior.Tests
scopedDispatchDDLChanges(scope honored / fails closed, incl. VSchema rejected);dispatchTargetShard(trim + validate single shard);buildShardedApplyOperationGroupsper-shard DDL + skips shards without changes + fails closed on a malformed change;protoShardPlansToStoragecarries per-shard changes + fails closed on nil/empty/unsupported-type/namespace-mismatch;replanShardTableDDLkeys per(shard, table)and degrades to table-only when non-sharded;LocalClient.Planrecords per-shard changes;GRPCClient.ResumeApplyOperationdispatches a finalizer as VSchema-only and reflects its terminal state (completed/failed) onto the operation row.pkg/tern,pkg/api,pkg/serve,pkg/storagesuites pass.Not in this PR (follow-ups)
🤖 Generated with Claude Code