notebooks: add FAST_MODE and active overfit-mode controls to catalog benchmark notebook by charlesmartin14 · Pull Request #50 · CalculatedContent/xgboost2ww

charlesmartin14 · 2026-03-03T19:15:57Z

Make the catalog-driven XGBoost benchmark faster and more configurable for iteration by adding a fast profiling mode and explicit control over which overfit cases are applied per dataset.
Provide runtime guards and caps to avoid long runs on Colab/Drive while preserving the ability to run full exhaustive experiments when desired.

Introduce FAST_MODE and related runtime tuning variables (RANDOM_SAMPLE_SIZE, MAX_OVERFIT_MODELS, MAX_OVERFIT_PAIRS_PER_RUN, INCLUDE_CONTROL_CASE, CHECKPOINT_SAVE_EVERY) to reduce work in quick runs and print warnings when caps are active.
Add ACTIVE_OVERFIT_MODES (selected strong overfit modes in fast mode) and make MAX_OVERFIT_CASES conditional on FAST_MODE, then use ACTIVE_OVERFIT_MODES throughout training and plotting logic instead of slicing OVERFIT_MODES directly.
Reduce training effort in FAST_MODE by changing training_schedule to use smaller num_boost_round values and setting early_stopping_rounds to lower values, and expose WeightWatcher parameters (WW_T_POINTS, WW_NFOLDS, WW_RANDOMIZE) for faster analysis.
Add early-stop/capping logic when creating overfit pairs (MAX_OVERFIT_PAIRS_PER_RUN), more sparse checkpoint writes (CHECKPOINT_SAVE_EVERY), update aggregation/checkpointing to save intermediate aggregated CSVs, and add a FAST_MODE validation warning about weak overfit signals.

Executed the notebook end-to-end in FAST_MODE on a reduced sample (caps active) producing console logs of dataset/overfit progress and saving checkpoint files such as checkpoint_results.csv and checkpoint_results_good_plus_overfit.csv, and the run completed without runtime exceptions.
Confirmed overfit runs and aggregated results were written to CHECKPOINT_AGGREGATED_CSV and results_per_dataset.csv in the experiment checkpoint directory during the interactive run.
No separate unit tests were added; validation was performed via the interactive FAST_MODE notebook execution shown in the run output.

Prioritize strong overfit modes in FAST_MODE and add signal checks

ac88354

charlesmartin14 added the codex label Mar 3, 2026 — with ChatGPT Codex Connector

Provide feedback