Bug Description
When processing runs, the trials execute in parallel until just before the first epoch. Then, they switch to sequential execution - the first trial runs to completion, then the second, etc.
To Reproduce
Steps to reproduce the behavior:
- Connect a T4 instance on Google Colab (or Colab Enterprise) to the attached notebook.
- Click on the "end experiment" cell, activate the command palette (Cmd/Ctr + Shift + P), and select "Run cells before the current".
- Observe the training after step 12.
Expected Behavior
Trials should run in parallel throughout the experiment.
Environment
- OS: Run on Colab Enterprise with these resources:
richard-single-gpu-3
Machine type: n1-standard-8
GPU type: NVIDIA_TESLA_T4 x 1
Environment: Python 3.12
Region: us-central1
- Python version: 3.12
- RapidFire AI version: 0.12.8
- Browser (if applicable): Firefox
Notebook
rf_colab_tensorboard_tutorial(2).ipynb
Error Logs
rapidfire.log
training.log
@pradyumna-rfai @arun-rfai
Bug Description
When processing runs, the trials execute in parallel until just before the first epoch. Then, they switch to sequential execution - the first trial runs to completion, then the second, etc.
To Reproduce
Steps to reproduce the behavior:
Expected Behavior
Trials should run in parallel throughout the experiment.
Environment
richard-single-gpu-3
Machine type: n1-standard-8
GPU type: NVIDIA_TESLA_T4 x 1
Environment: Python 3.12
Region: us-central1
Notebook
rf_colab_tensorboard_tutorial(2).ipynb
Error Logs
rapidfire.log
training.log
@pradyumna-rfai @arun-rfai