Skip to content

notebooks: add FAST_MODE and active overfit-mode controls to catalog benchmark notebook#50

Draft
charlesmartin14 wants to merge 1 commit intomainfrom
codex/address-weightwatcher-index-out-of-bounds-warning-4zjrau
Draft

notebooks: add FAST_MODE and active overfit-mode controls to catalog benchmark notebook#50
charlesmartin14 wants to merge 1 commit intomainfrom
codex/address-weightwatcher-index-out-of-bounds-warning-4zjrau

Conversation

@charlesmartin14
Copy link
Member

Motivation

  • Make the catalog-driven XGBoost benchmark faster and more configurable for iteration by adding a fast profiling mode and explicit control over which overfit cases are applied per dataset.
  • Provide runtime guards and caps to avoid long runs on Colab/Drive while preserving the ability to run full exhaustive experiments when desired.

Description

  • Introduce FAST_MODE and related runtime tuning variables (RANDOM_SAMPLE_SIZE, MAX_OVERFIT_MODELS, MAX_OVERFIT_PAIRS_PER_RUN, INCLUDE_CONTROL_CASE, CHECKPOINT_SAVE_EVERY) to reduce work in quick runs and print warnings when caps are active.
  • Add ACTIVE_OVERFIT_MODES (selected strong overfit modes in fast mode) and make MAX_OVERFIT_CASES conditional on FAST_MODE, then use ACTIVE_OVERFIT_MODES throughout training and plotting logic instead of slicing OVERFIT_MODES directly.
  • Reduce training effort in FAST_MODE by changing training_schedule to use smaller num_boost_round values and setting early_stopping_rounds to lower values, and expose WeightWatcher parameters (WW_T_POINTS, WW_NFOLDS, WW_RANDOMIZE) for faster analysis.
  • Add early-stop/capping logic when creating overfit pairs (MAX_OVERFIT_PAIRS_PER_RUN), more sparse checkpoint writes (CHECKPOINT_SAVE_EVERY), update aggregation/checkpointing to save intermediate aggregated CSVs, and add a FAST_MODE validation warning about weak overfit signals.

Testing

  • Executed the notebook end-to-end in FAST_MODE on a reduced sample (caps active) producing console logs of dataset/overfit progress and saving checkpoint files such as checkpoint_results.csv and checkpoint_results_good_plus_overfit.csv, and the run completed without runtime exceptions.
  • Confirmed overfit runs and aggregated results were written to CHECKPOINT_AGGREGATED_CSV and results_per_dataset.csv in the experiment checkpoint directory during the interactive run.
  • No separate unit tests were added; validation was performed via the interactive FAST_MODE notebook execution shown in the run output.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant