Skip to content

BaljinderHothi/hyrax-lib

Repository files navigation

hyrax

PyPI version Python 3.8+ License: MIT

Lightweight distributed training for multi-dataset workflows on local hardware.

 _                            
| |__  _   _ _ __ __ ___  __ 
| '_ \| | | | '__/ _` \ \/ / 
| | | | |_| | | | (_| |>  <  
|_| |_|\__, |_|  \__,_/_/\_\ 
       |___/

Overview

Hyrax enables concurrent training of models across multiple datasets on local multi-GPU setups without the complexity of Kubernetes or distributed computing frameworks. It automatically detects available hardware, intelligently schedules jobs, and monitors training progress.

Key features:

  • Automatic GPU detection and allocation (CUDA, MPS, CPU)
  • Intelligent job scheduling with bin-packing optimization
  • Real-time training monitoring via TensorBoard
  • Dataset-agnostic interface (Minari, HDF5, pickle, custom loaders)
  • Zero-configuration deployment for local machines
  • Offline-first design

Installation

pip install hyrax-lib

Requirements:

  • Python 3.8+
  • PyTorch 2.0+
  • CUDA-capable GPU (optional, CPU and Apple Silicon supported)

Quick Start

Train a behavioral cloning model on three MuJoCo datasets concurrently:

from hyrax import DistributedTrainer
import minari

trainer = DistributedTrainer(
    model=BehavioralCloningModel,
    datasets=[
        "mujoco/humanoid/expert-v0",
        "mujoco/halfcheetah/expert-v0",
        "mujoco/hopper/expert-v0"
    ],
    dataset_loader=minari.load_dataset,
)

results = trainer.train(epochs=200)

Hyrax automatically:

  1. Detects available GPUs and memory
  2. Schedules jobs to minimize resource contention
  3. Distributes datasets across workers
  4. Monitors training progress
  5. Returns aggregated results

Usage

Basic Usage

from hyrax import DistributedTrainer

trainer = DistributedTrainer(
    model=YourModel,
    datasets=["dataset1", "dataset2", "dataset3"]
)

results = trainer.train(epochs=100)

Custom Dataset Loaders

def load_custom_data(path):
    # your loading logic
    return dataset

trainer = DistributedTrainer(
    model=YourModel,
    datasets=["path/to/data1", "path/to/data2"],
    dataset_loader=load_custom_data
)

Pre-loaded Datasets

datasets = [load_data(x) for x in paths]

trainer = DistributedTrainer(
    model=YourModel,
    datasets=datasets  # already loaded
)

Memory Estimation

Provide memory estimates for better scheduling:

trainer = DistributedTrainer(
    model=YourModel,
    datasets=datasets,
    job_size_estimates=[2*1024**3, 3*1024**3, 2*1024**3]  # bytes
)

Monitoring

Hyrax automatically logs training metrics to TensorBoard:

tensorboard --logdir=runs

Navigate to http://localhost:6006 to view real-time training progress across all workers.

Architecture

Hyrax consists of four main components:

  • ResourceManager: Detects GPUs, CPUs, and available memory
  • LoadBalancer: Schedules jobs using bin-packing optimization
  • TrainingWorker: Executes training on assigned hardware
  • TrainingMonitor: Logs metrics and progress via TensorBoard

Supported Backends

  • CUDA: NVIDIA GPUs
  • MPS: Apple Silicon (M1/M2/M3)
  • CPU: Fallback for systems without GPUs

When to Use Hyrax

Good for:

  • Training the same model on multiple datasets simultaneously
  • Local multi-GPU workstations
  • Offline training environments
  • Rapid prototyping and experimentation

Not suitable for:

  • Multi-node distributed training (use Ray, DeepSpeed, or Kubernetes)
  • Model parallelism across GPUs
  • Production inference serving

Examples

See examples/ for complete working examples:

Documentation

Contributing

Contributions welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE for details.

Citation

If you use Hyrax in your research, please cite:

@software{hyrax2026,
  author = {Hothi, Baljinder},
  title = {Hyrax: Lightweight Distributed Training for Local Hardware},
  year = {2026},
  url = {https://github.com/BaljinderHothi/hyrax-lib}
}

Acknowledgments

Named after the rock hyrax, a small mammal thats really cute

About

Distributed training for your local machine. No kubernetes, no cloud, all offline locally

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages