We provide a convenient way to benchmark the performance, mainly measured in throughput and MFU, of the inference engine and trainer using the --bench flag. It will run each module in isolation for a few steps and log performance benchmark results in a rich table to the console.
Benchmark on the default fake data configuration
uv run sft ... --data.type fake --benchBenchmark with variable-length, instead of fixed-length, fake data to more closely simulate real data.
uv run sft ... --data.type fake --data.length variable --benchBenchmark different batch configurations, i.e. the (micro) batch size and sequence length
uv run sft ... --data.type fake --data.seq-len 4096 --data.batch-size 64 --data.micro-batch-size 2 --benchBenchmark against a real dataset
uv run sft ... --data.name PrimeIntellect/Reverse-Text-SFT --benchBenchmark against a training configuration
uv run sft @ path/to/config.toml --benchBenchmark on a fake data loader
uv run trainer ... --data.fake --benchBenchmark different batch configurations, i.e. the (micro) batch size and sequence length
uv run trainer ... --model.seq-len 4096 --data.fake.batch-size 64 --data.fake.micro-batch-size 2 --benchNote, that it is not yet possible to benchmark the RL trainer against real data when benchmarking the RL trainer in isolation.
To benchmark the inference engine in isolation, start the inference server with the correct configuration file and run the orchestrator with the --bench flag.
uv run inference @ path/to/config.tomluv run orchestrator @ path/to/config.toml --benchNote, that it is not yet possible to benchmark the inference engine against fake data.
To benchmark the full RL training, you can add the --bench flag to your RL entrypoint. This will benchmark the RL trainer against fake data and the inference engine against real data from the orchestrator.
uv run rl \
--trainer @ path/to/train.toml \
--orchestrator @ path/to/orch.toml \
--inference @ path/to/infer.toml \
--bench