Skip to content

feat(transport): TensorMeta segment views — selection without hydration#49

Draft
leviking98z-rgb wants to merge 1 commit into
perf/04-dp-seqlen-balancefrom
perf/07-tensormeta-segment-views
Draft

feat(transport): TensorMeta segment views — selection without hydration#49
leviking98z-rgb wants to merge 1 commit into
perf/04-dp-seqlen-balancefrom
perf/07-tensormeta-segment-views

Conversation

@leviking98z-rgb

Copy link
Copy Markdown
Collaborator

Part of the verl performance-parity series tracked in #40. Root-cause follow-up to #45.

Summary

#45's DP balancing needed hydrate_track: a driver-side full hydration of every TensorMeta field before Batch.select, because selection had no representation on remote refs (TensorMeta.select raised; _slice_by_refs only cut at ref boundaries). That workaround broke the transport's zero-copy premise (worker -> driver -> worker bounce), mutated frozen dataclasses in place, left field types history-dependent, and buried a padding convention in a private helper.

This PR adds the missing primitive: segment views.

  • TensorMeta.view_plan — an ordered list of (ref_idx, start, end) segments in ref-local units (rows for CONCAT fields, tokens for PACKED fields).
  • select / select_units / select_segments build lazy views — zero data motion on the driver; misaligned slice now degrades to a view instead of raising.
  • localize preserves plans through ref routing (with_refs, not from_handles).
  • Materialization is centralized in TensorMeta.assemble and wired into every input path (hydrate, base / gpu_store / transfer_queue get_batch). Trailing-dim CONTRACT documented there: segments crossing refs padded to different widths are right-padded with zeros (the TextTokenCondition.concat convention) — consumers of 2D+ per-shard-padded fields must be mask-driven.
  • balance_track_for_dp now permutes via native track.select(perm): data stays worker-resident and materializes on the destination worker. hydrate_track remains a utility but is off the balance path.

Test Plan

  • tests/test_tensormeta_views.py: 7 CPU unit tests — permutation + ragged right-pad, view slicing, misaligned-slice degradation, packed token segments, plan survival through with_refs, empty selection, assemble parity.
  • 16-GPU e2e gate (viewbal_e2e, Qwen3-4B DRPO + balance on): 5 steps, ratio_mean 0.9995-1.0004, rank token spread 0.06%, rewards nominal.
  • Legacy paths regression-covered: plan-less metas keep the exact old backend.get(refs) / boundary-slice fast paths.

Root-cause fix for the DP-balance hydration workaround. TensorMeta gains
view_plan: an ordered list of (ref_idx, start, end) segments in ref-local
units. select/select_units/select_segments build lazy VIEWS over the remote
refs (zero data motion on the driver); misaligned slices degrade to views
instead of raising; localize preserves plans through ref routing
(with_refs, not from_handles); materialization is centralized in
TensorMeta.assemble with a documented trailing-dim contract (segments
crossing refs padded to different widths are right-padded with zeros — the
TextTokenCondition.concat convention) and wired into every input path:
transport.hydrate, base/gpu_store/transfer_queue get_batch.

balance_track_for_dp now permutes via native track.select(perm): data stays
worker-resident and materializes on the destination worker. hydrate_track
remains as a utility but is off the balance path.

Verified: 7 CPU unit tests (permutation + ragged pad, view slicing,
misaligned-slice degradation, packed token segments, plan survival through
with_refs, empty selection, assemble parity) and a 16-GPU e2e gate
(viewbal_e2e: 5 steps, ratio_mean 0.9995-1.0004, rank token spread 0.06%).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant