openai · liveyourday · Apr 13, 2026 · Apr 13, 2026 · Apr 13, 2026
diff --git a/records/track_non_record_16mb/2026-04-13_SaliencyGuidedLocal5090/README.md b/records/track_non_record_16mb/2026-04-13_SaliencyGuidedLocal5090/README.md
@@ -0,0 +1,79 @@
+# Non-Record Submission: Saliency-Guided Local 5090
+
+This record documents a local 1xRTX 5090 saliency-guided sweep around the 9x512 GQA baseline. The run sequence was:
+
+1. the original 5B-token full-eval run on March 26, 2026
+2. a 24-hour full-eval continuation on April 8, 2026
+3. a 1-hour cosine-schedule proxy run on April 9, 2026
+
+The folder is intentionally normalized to the standard record layout:
+
+- `README.md` for the write-up
+- `results.tsv` for the run table
+- `submission.json` for the best full-eval run metadata
+- `train_gpt.py` for the experiment-local code snapshot
+
+`results.tsv` is transcribed from the local `wandb/` run directories, not from ad hoc notes.
+
+## Best Result
+
+The best full-eval run in this sweep is the 24-hour continuation:
+
+- run: `saliency_24h_30gb_legacy_seed2025`
+- started: `2026-04-08 11:21:09Z` / `2026-04-08 20:21 KST`
+- pre-quant `val_bpb`: `1.21953852`
+- post-quant `val_bpb`: `1.22864374`
+- counted artifact bytes: `15,850,915`
+- stop step: `44,628`
+- GPU: `1xRTX5090`
+
+This is the run described in `submission.json`.
+
+## Sweep Summary
+
+All three recorded runs use the same core saliency recipe:
+
+- 9 layers, width 512, 8 heads, 4 KV heads, MLP mult 2
+- SentencePiece vocab 1024, sequence length 1024
+- saliency token prior on
+- saliency dynamic correction on
+- saliency phrase term on
+- saliency attention bias on
+- saliency bigram off
+- `TRAIN_BATCH_TOKENS=262144`
+
+The changes across the sweep were:
+
+- 5B run: fixed-token budget, `50` train shards, legacy flat-plus-tail warmdown schedule
+- 24h run: same legacy schedule and saliency settings, but extended to `125` train shards and a 24-hour wallclock cap
+- 1h run: same saliency settings, but reduced to `8` train shards and switched to cosine LR with `LR_WARMUP_STEPS=64`, `LR_DECAY_START_FRAC=0.65`, `LR_MIN_SCALE=0.15`
+
+## Run Chronology
+
+| Run | Start | Eval | Key result |
+| --- | --- | --- | --- |
+| 5B | 2026-03-26 18:01 KST | full | post-quant `1.24582822` |
+| 24h | 2026-04-08 20:21 KST | full | post-quant `1.22864374` |
+| 1h | 2026-04-09 23:26 KST | proxy (4,194,304 val tokens) | post-quant `1.35156821` |
+
+## Results Source
+
+The values in `results.tsv` were read from these local W&B run folders:
+
+- `wandb/run-20260326_180143-wzspgrg0`
+- `wandb/run-20260408_202109-vajvzlaj`
+- `wandb/run-20260409_232646-hgygstyg`
+
+The recorded numbers come from `wandb-summary.json`, `config.yaml`, and `output.log` in those directories.
+
+## Included Files
+
+- `train_gpt.py`: experiment-local trainer snapshot kept with this record
+- `results.tsv`: normalized run table from the local W&B artifacts
+- `submission.json`: metadata for the best full-eval run
+
+## Notes
+
+- This folder date, `2026-04-13`, is the record write-up date, not the original training date.
+- The included `train_gpt.py` is the experiment-local code snapshot kept for recordkeeping and reruns. The reported metrics themselves come from the recorded W&B runs listed above.
+- The 1-hour run is intentionally marked as proxy-only because it used `VAL_MAX_TOKENS=4194304` with `LOCAL_PROXY_EVAL=1`.
diff --git a/records/track_non_record_16mb/2026-04-13_SaliencyGuidedLocal5090/results.tsv b/records/track_non_record_16mb/2026-04-13_SaliencyGuidedLocal5090/results.tsv
@@ -0,0 +1,4 @@
+label	run_id	wandb_run_dir	started_at_utc	eval_split	train_shards	lr_schedule	step_stop	train_time_s	pre_quant_val_bpb	post_quant_val_bpb	bytes_total	notes
+5b_full_eval	saliency_5b_30gb_seed2025	wandb/run-20260326_180143-wzspgrg0	2026-03-26T09:01:43.917794Z	full	50	flat_warmdown	19073	39633.45	1.23881099	1.24582822	15858012	original 5B saliency winner
+24h_full_eval	saliency_24h_30gb_legacy_seed2025	wandb/run-20260408_202109-vajvzlaj	2026-04-08T11:21:09.972987Z	full	125	flat_warmdown	44628	86401.05	1.21953852	1.22864374	15850915	best full-eval continuation
+1h_proxy_eval	saliency_1h_30gb_legacy_seed2025	wandb/run-20260409_232646-hgygstyg	2026-04-09T14:26:46.332733Z	proxy_4194304_tokens	8	cosine	1844	3601.39	1.34998024	1.35156821	15303442	1-hour scheduler probe
diff --git a/records/track_non_record_16mb/2026-04-13_SaliencyGuidedLocal5090/submission.json b/records/track_non_record_16mb/2026-04-13_SaliencyGuidedLocal5090/submission.json
@@ -0,0 +1,18 @@
+{
+  "author": "",
+  "github_id": "",
+  "name": "Saliency-Guided Local 5090",
+  "blurb": "Single-RTX5090 non-record saliency-guided sweep documenting the original 5B run, a 24-hour continuation, and a 1-hour cosine-schedule probe. Best full-eval post-quant score is 1.22864374 val_bpb at 15,850,915 counted bytes.",
+  "date": "2026-04-08T11:21:09.972987Z",
+  "track": "non-record-16mb",
+  "val_loss": 2.074513482238166,
+  "val_bpb": 1.2286437358795264,
+  "pre_quant_val_loss": 2.0591396984249735,
+  "pre_quant_val_bpb": 1.2195385151419558,
+  "step_stop": 44628,
+  "wallclock_seconds": 86400,
+  "bytes_total": 15850915,
+  "bytes_model_int8_zlib": 15781469,
+  "bytes_code": 69446,
+  "gpu": "1xRTX5090"
+}