Feature request: BitNet 1.58-bit ternary inference on ROCm (gfx1151)

## Request

Enable BitNet 1.58-bit (ternary weight) inference on ROCm builds. The architecture is listed as supported in the README but the pre-built gfx1151 binary fails to load BitNet models.

## What We Tested

```
Binary: b1004-tech-preview (gfx1151)
Model: mlx-community/bitnet-b1.58-2B-4T
Result: warmup failed — model loads but inference crashes
```

Also tested `mlx-community/Falcon-E-3B-Instruct-1.58bit` — same result.

## Why This Matters

BitNet 1.58-bit uses ternary weights {-1, 0, 1}. On unified memory hardware like Strix Halo (128GB), 1-bit models would:

- **Dramatically reduce memory bandwidth pressure** — the main bottleneck on unified memory
- **Enable much larger effective model sizes** — a 70B 1-bit model fits where a 70B FP16 cannot
- **Pair with MLX speed advantage** — MLX is already 29-85% faster than vLLM on this hardware. Adding 1-bit would compound that.

Microsoft open-sourced BitNet. PrismML has Bonsai 1-bit MLX models (Apple Silicon). The architecture support is in the codebase — the ternary matmul kernels just need the ROCm/HIP path.

## Available Models

```
mlx-community/bitnet-b1.58-2B-4T           — Microsoft official
mlx-community/Falcon-E-3B-Instruct-1.58bit — Falcon extreme quant
prism-ml/Bonsai-8B-mlx-1bit                — PrismML (Apple Silicon)
prism-ml/Bonsai-4B-mlx-1bit
prism-ml/Bonsai-1.7B-mlx-1bit
```

## Test Environment

```
Hardware: AMD Strix Halo, 128GB unified, gfx1151
Binary:   b1004-tech-preview
OS:       CachyOS (Arch), kernel 7.0.0-1-mainline
```

## Context

We have comprehensive MLX benchmarks on this hardware — 6 models passing at 21-151 tok/s (4-bit). Full results: https://github.com/stampby/bleeding-edge

Happy to test 1-bit builds when available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: BitNet 1.58-bit ternary inference on ROCm (gfx1151) #2

Request

What We Tested

Why This Matters

Available Models

Test Environment

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: BitNet 1.58-bit ternary inference on ROCm (gfx1151) #2

Description

Request

What We Tested

Why This Matters

Available Models

Test Environment

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions