Request
Enable BitNet 1.58-bit (ternary weight) inference on ROCm builds. The architecture is listed as supported in the README but the pre-built gfx1151 binary fails to load BitNet models.
What We Tested
Binary: b1004-tech-preview (gfx1151)
Model: mlx-community/bitnet-b1.58-2B-4T
Result: warmup failed — model loads but inference crashes
Also tested mlx-community/Falcon-E-3B-Instruct-1.58bit — same result.
Why This Matters
BitNet 1.58-bit uses ternary weights {-1, 0, 1}. On unified memory hardware like Strix Halo (128GB), 1-bit models would:
- Dramatically reduce memory bandwidth pressure — the main bottleneck on unified memory
- Enable much larger effective model sizes — a 70B 1-bit model fits where a 70B FP16 cannot
- Pair with MLX speed advantage — MLX is already 29-85% faster than vLLM on this hardware. Adding 1-bit would compound that.
Microsoft open-sourced BitNet. PrismML has Bonsai 1-bit MLX models (Apple Silicon). The architecture support is in the codebase — the ternary matmul kernels just need the ROCm/HIP path.
Available Models
mlx-community/bitnet-b1.58-2B-4T — Microsoft official
mlx-community/Falcon-E-3B-Instruct-1.58bit — Falcon extreme quant
prism-ml/Bonsai-8B-mlx-1bit — PrismML (Apple Silicon)
prism-ml/Bonsai-4B-mlx-1bit
prism-ml/Bonsai-1.7B-mlx-1bit
Test Environment
Hardware: AMD Strix Halo, 128GB unified, gfx1151
Binary: b1004-tech-preview
OS: CachyOS (Arch), kernel 7.0.0-1-mainline
Context
We have comprehensive MLX benchmarks on this hardware — 6 models passing at 21-151 tok/s (4-bit). Full results: https://github.com/stampby/bleeding-edge
Happy to test 1-bit builds when available.
Request
Enable BitNet 1.58-bit (ternary weight) inference on ROCm builds. The architecture is listed as supported in the README but the pre-built gfx1151 binary fails to load BitNet models.
What We Tested
Also tested
mlx-community/Falcon-E-3B-Instruct-1.58bit— same result.Why This Matters
BitNet 1.58-bit uses ternary weights {-1, 0, 1}. On unified memory hardware like Strix Halo (128GB), 1-bit models would:
Microsoft open-sourced BitNet. PrismML has Bonsai 1-bit MLX models (Apple Silicon). The architecture support is in the codebase — the ternary matmul kernels just need the ROCm/HIP path.
Available Models
Test Environment
Context
We have comprehensive MLX benchmarks on this hardware — 6 models passing at 21-151 tok/s (4-bit). Full results: https://github.com/stampby/bleeding-edge
Happy to test 1-bit builds when available.