Skip to content

Add overfitting modes, GPU detection, and overfit-generation plotting to XGBoost notebook#26

Open
charlesmartin14 wants to merge 1 commit intomainfrom
codex/plot-histograms-for-overfit-metrics
Open

Add overfitting modes, GPU detection, and overfit-generation plotting to XGBoost notebook#26
charlesmartin14 wants to merge 1 commit intomainfrom
codex/plot-histograms-for-overfit-metrics

Conversation

@charlesmartin14
Copy link
Member

Motivation

  • Introduce deterministic overfitting scenarios so models can be systematically driven into overfit regimes for analysis and visualization.
  • Prefer GPU compute when available to speed up XGBoost training while keeping a safe CPU fallback.

Description

  • Added OVERFIT_MODES and apply_overfit_mode to provide named overfit perturbations to base XGBoost parameters (deep trees, high learning rate, no regularization, full sampling, long training).
  • Implemented detect_xgb_compute_params() which probes XGBoost to choose GPU vs CPU XGB_COMPUTE_PARAMS and exposes XGB_COMPUTE_BACKEND with a printed status.
  • Extended fit_and_score() to accept case_type, overfit_mode, and seed_offset; use a derived run_seed for data splits and model seeds, apply overfit parameter modifications and fixed round mappings for overfit runs, and add accuracy_gap, case_type, and overfit_mode to results.
  • Updated WeightWatcher invocation to use run_seed and added a plotting cell that generates overfit runs from base (good) models and plots histograms for alpha, ERG_gap, and num_traps using OVERFIT_REPEATS_PER_MODEL and PLOT_METRICS.

Testing

  • Executed the GPU detection probe which completed and reported a detected backend successfully.
  • Performed a smoke fit_and_score run on a small dataset end-to-end (train → convert → WeightWatcher analyze) which completed without error.
  • Observed that long-running cross-validation on full datasets can be interrupted (previous xgb.cv runs triggered a KeyboardInterrupt), indicating large CV jobs may still be time-consuming in this environment.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant