A framework for modular, testable, and reproducible ML research. GPT-Lab helps researchers build experiments with strong reproducibility guarantees while enabling rapid iteration.
This repo is in early alpha and frequently undergoing major restructuring.
It is my belief that we are somewhat close to the final structure but the implementation is definitely not as clean as it could be.
At the vary least, you should be able to rely on working on experiments inside experiments/<experiment_name>/ to stay consistent in structure, but the same cannot be said for experiments in separate repos or git submodules.
- Modular Catalogs: Composable components for models, optimizers, train loops, and data sources
- Namespace Bootstrapping: Flexible catalog activation across experiments, packs, and core
- Reproducibility: Git tracking, RNG state management, and experiment restoration
- Testing: Automated discovery-based tests for all catalog items
- Interactive Tools: Marimo notebooks for analysis and benchmarking
# Install with all development dependencies
pip install -e '.[dev]'
# Or install specific extras
pip install -e '.[nlp]' # NLP pack
pip install -e '.[cv]' # CV pack (planned)pytestSee docs/testing.md for details.
python CLIs/scaffold_experiment.py my_experiment
cd experiments/my_experiment
python main.pyFull documentation is available in the docs/ directory.
View locally with MkDocs:
pip install mkdocs
mkdocs serveThen open http://127.0.0.1:8000
- Concepts: Architecture | Configuration | Testing
- Components: Logger | Checkpointer | Reproducibility | Device | Distributed | Configuration
- CLIs: Scaffold | Test All | Multi-Run | Print Paths | Validate
- Notebooks: Marimo Intro | Experiment Comparison | Benchmarks
- Catalogs: Train Loops | Modules | Optimizers | Models
- Packs: Core | NLP | CV
import argparse
from gpt_lab.configuration import compose_config
from gpt_lab.reproducibility import ReproducibilityManager
from gpt_lab.distributed import DistributedManager
from gpt_lab.logger import setup_experiment_logging
from gpt_lab.train_loops import smart_train
def main():
parser = argparse.ArgumentParser()
config = compose_config(parser)
with DistributedManager() as dist:
dist.set_seed(config['seed'])
with ReproducibilityManager(
output_dir=config['output_dir'],
is_main_process=dist.is_main_process
) as repro:
setup_experiment_logging(
log_dir=f"{repro.output_dir}/logs",
rank=dist.rank,
is_main_process=dist.is_main_process
)
# Your training code
smart_train(
model=model,
train_loader=train_loader,
optimizer=optimizer,
num_epochs=config['num_epochs']
)
if __name__ == "__main__":
main()Via environment variables:
export GPT_LAB_CURRENT_EXPERIMENT=nano_gpt
export GPT_LAB_ACTIVE_PACKS=nlpVia YAML files:
# experiments/my_exp/gpt_lab.yaml
include_experiments: []
include_packs: ['nlp']Debug activation:
python CLIs/print_active_paths.py -vGPT-Lab organizes code into catalogs under a unified gpt_lab.* namespace with configurable precedence:
- Current experiment (highest precedence)
- Active experiments
- Active packs
- Core (lowest precedence, always active)
Each level can override or extend components from lower levels.
See docs/architecture.md for details.
├── src/gpt_lab/ # Main package source
├── tests/ # Test discovery and execution
├── experiments/ # Experiment catalog
├── catalogs/
│ ├── core/ # Core components (always active)
│ └── packs/ # Domain-specific packs (nlp, cv)
├── CLIs/ # Command-line tools
├── notebooks/ # Marimo notebooks for analysis
├── docs/ # Documentation
└── pyproject.toml # Package configuration
# All tests
pytest
# Specific experiment
python CLIs/pytest_all_experiments.py --include nano_gpt
# With coverage
pytest --cov=src/gpt_lab --cov-report=htmlSee docs/testing.md for details.
# Run benchmarks
python -m gpt_lab.nn_modules.catalog_benchmark
python -m gpt_lab.optimizers.catalog_benchmark
# View results
marimo edit notebooks/nn_modules_bench.py
marimo edit notebooks/optimizers_bench.py- Create feature branch
- Add tests for new components
- Update documentation
- Run full test suite
- Submit pull request
See individual documentation files for contributing guidelines for specific components.
See LICENSE for details.
- Documentation: Run
mkdocs serveand visithttp://127.0.0.1:8000 - Issues: Report bugs and request features via GitHub issues
- Examples: See
experiments/directory for working examples