Skip to content

Non-record: Internal control port on the PR180 stack#311

Open
small-cactus wants to merge 1 commit intoopenai:mainfrom
small-cactus:codex/pr180_internal_control_nonrecord
Open

Non-record: Internal control port on the PR180 stack#311
small-cactus wants to merge 1 commit intoopenai:mainfrom
small-cactus:codex/pr180_internal_control_nonrecord

Conversation

@small-cactus
Copy link

Summary

Non-record submission porting our late-attention internal-control method onto the PR180 stack.

This is not a record claim yet. The record run is still in development. This PR documents the port, the current training/eval path, and the runtime fixes needed to make the approach runnable on multi-GPU pods.

Key ideas

  1. Late-attention internal control on top of the PR180 recipe rather than replacing the existing stack.
  2. Detached gate-controlled attention scaling with gate_fuse_mode=k.
  3. EMA-zscore energy normalization with clipping to stabilize the control signal.
  4. Compile-path fixes so the control recurrence does not blow up graph size in the public script.
  5. Fast iteration support for public-track debugging: capped validation for smoke runs and eval progress logging.

Files

  • records/track_10min_16mb/2026-03-20_10L_Int5MLP_MuonWD04_SWA50/train_gpt.py
    • Internal-control port
    • Compile isolation fixes for the recurrent control path
    • Multi-GPU grad accumulation override
    • Fastcheck validation cap and eval progress logging

Reproducibility

Fastcheck example:

CONTROL_ENABLED=1 \
CONTROL_DETACH_GATES=1 \
CONTROL_GATE_FUSE_MODE=k \
CONTROL_ENERGY_NORM=ema_zscore \
CONTROL_ENERGY_CLIP=0.99 \
CONTROL_APPLY_TO=attn \
CONTROL_LATE_LAYERS=3 \
CONTROL_WARMUP_STEPS=8 \
CONTROL_SCAN_BLOCK_SIZE=16 \
VAL_LOSS_EVERY=0 \
VAL_MAX_TOKENS=4194304 \
COMPILE_MODEL=0 \
COMPILE_ZERPOWER=1 \
COMPILE_CONTROL_SCAN=0 \
torchrun --standalone --nproc_per_node=5 train_gpt.py

Status

  • Public-track port is running on multi-GPU hardware.
  • Fastcheck runs are working after fixing the control compile path and the silent full-validation stall.
  • Full record-size evaluation is still in development.
  • No record claim in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant