Skip to content

evintunador/gpt-lab

Repository files navigation

GPT-Lab

A framework for modular, testable, and reproducible ML research. GPT-Lab helps researchers build experiments with strong reproducibility guarantees while enabling rapid iteration.

WARNING

This repo is in early alpha and frequently undergoing major restructuring. It is my belief that we are somewhat close to the final structure but the implementation is definitely not as clean as it could be. At the vary least, you should be able to rely on working on experiments inside experiments/<experiment_name>/ to stay consistent in structure, but the same cannot be said for experiments in separate repos or git submodules.

Key Features

  • Modular Catalogs: Composable components for models, optimizers, train loops, and data sources
  • Namespace Bootstrapping: Flexible catalog activation across experiments, packs, and core
  • Reproducibility: Git tracking, RNG state management, and experiment restoration
  • Testing: Automated discovery-based tests for all catalog items
  • Interactive Tools: Marimo notebooks for analysis and benchmarking

Quick Start

Installation

# Install with all development dependencies
pip install -e '.[dev]'

# Or install specific extras
pip install -e '.[nlp]'  # NLP pack
pip install -e '.[cv]'   # CV pack (planned)

Run Tests

pytest

See docs/testing.md for details.

Create an Experiment

python CLIs/scaffold_experiment.py my_experiment
cd experiments/my_experiment
python main.py

Documentation

Full documentation is available in the docs/ directory.

View locally with MkDocs:

pip install mkdocs
mkdocs serve

Then open http://127.0.0.1:8000

Documentation Structure

Example Usage

Basic Experiment

import argparse
from gpt_lab.configuration import compose_config
from gpt_lab.reproducibility import ReproducibilityManager
from gpt_lab.distributed import DistributedManager
from gpt_lab.logger import setup_experiment_logging
from gpt_lab.train_loops import smart_train

def main():
    parser = argparse.ArgumentParser()
    config = compose_config(parser)
    
    with DistributedManager() as dist:
        dist.set_seed(config['seed'])
        
        with ReproducibilityManager(
            output_dir=config['output_dir'],
            is_main_process=dist.is_main_process
        ) as repro:
            setup_experiment_logging(
                log_dir=f"{repro.output_dir}/logs",
                rank=dist.rank,
                is_main_process=dist.is_main_process
            )
            
            # Your training code
            smart_train(
                model=model,
                train_loader=train_loader,
                optimizer=optimizer,
                num_epochs=config['num_epochs']
            )

if __name__ == "__main__":
    main()

Activating Catalogs

Via environment variables:

export GPT_LAB_CURRENT_EXPERIMENT=nano_gpt
export GPT_LAB_ACTIVE_PACKS=nlp

Via YAML files:

# experiments/my_exp/gpt_lab.yaml
include_experiments: []
include_packs: ['nlp']

Debug activation:

python CLIs/print_active_paths.py -v

Architecture

GPT-Lab organizes code into catalogs under a unified gpt_lab.* namespace with configurable precedence:

  1. Current experiment (highest precedence)
  2. Active experiments
  3. Active packs
  4. Core (lowest precedence, always active)

Each level can override or extend components from lower levels.

See docs/architecture.md for details.

Repository Structure

├── src/gpt_lab/          # Main package source
├── tests/                # Test discovery and execution
├── experiments/          # Experiment catalog
├── catalogs/
│   ├── core/            # Core components (always active)
│   └── packs/           # Domain-specific packs (nlp, cv)
├── CLIs/                # Command-line tools
├── notebooks/           # Marimo notebooks for analysis
├── docs/                # Documentation
└── pyproject.toml       # Package configuration

Development

Running Tests

# All tests
pytest

# Specific experiment
python CLIs/pytest_all_experiments.py --include nano_gpt

# With coverage
pytest --cov=src/gpt_lab --cov-report=html

See docs/testing.md for details.

Benchmarking

# Run benchmarks
python -m gpt_lab.nn_modules.catalog_benchmark
python -m gpt_lab.optimizers.catalog_benchmark

# View results
marimo edit notebooks/nn_modules_bench.py
marimo edit notebooks/optimizers_bench.py

Contributing

  1. Create feature branch
  2. Add tests for new components
  3. Update documentation
  4. Run full test suite
  5. Submit pull request

See individual documentation files for contributing guidelines for specific components.

License

See LICENSE for details.

Links

  • Documentation: Run mkdocs serve and visit http://127.0.0.1:8000
  • Issues: Report bugs and request features via GitHub issues
  • Examples: See experiments/ directory for working examples

About

cheap & easy LLM experiments for amateurs (alpha)

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages