Orome

Inference engine for Apple Silicon, currently focused on GGUF Qwen3.5 models.

Hardware: Mac Studio M2 Max — 38 GPU cores, 96 GB unified memory, NVMe SSD.

Current Focus

Supported autoresearch models: Qwen3.5-9B-Q8_0.gguf, Qwen3.5-27B-Q4_K_M.gguf, Qwen3.5-35B-A3B-Q4_K_S.gguf
Experiment logs: inference/experiments/

vs llama.cpp

Same GGUF files, same hardware. Full methodology and quality results.

Model	Orome tok/s	llama.cpp tok/s	Quality
Qwen3.5-9B-Q8_0	35.32	31.22	3/3 both
Qwen3.5-27B-Q4_K_M	17.59	14.77	3/3 both
Qwen3.5-35B-A3B-Q4_K_S	65.15	51.34	3/3 both

Quick Start

# Build
make

# Run inference
./orome --model /path/to/model.gguf --prompt "Hello" --tokens 100

# Serve (OpenAI-compatible API)
make serve MODEL=/path/to/model.gguf

# Chat (terminal client)
make chat

How It Works

The current engine is GGUF-only. The model fits in unified memory, so the hot path is a Metal forward pass that keeps hidden state on GPU, resolves tensors through the GGUF cache, and spends its time on dequant matvec, attention, expert routing, expert compute, and dispatch/barrier overhead.

Architecture

inference/
  include/orome.h      — Types, ModelConfig, TensorRef, Engine interfaces
  src/                  — Objective-C engine, Metal shaders, GGUF loader, HTTP server
  vendor/              — Third-party (linenoise, tokenizer)
  tools/               — Benchmarking, comparison, chat client, stress test
  experiments/         — Per-model optimization logs and configs
scripts/               — Experiment runner
docs/                  — Detailed comparison data, model notes, research

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
docs		docs
inference		inference
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Orome

Current Focus

vs llama.cpp

Quick Start

How It Works

Architecture

Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Orome

Current Focus

vs llama.cpp

Quick Start

How It Works

Architecture

Docs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages