feat: GigaAM v3 CTC inference on Apple Silicon via MLX by misteral · Pull Request #61 · salute-developers/GigaAM

misteral · 2026-03-13T05:22:47Z

Summary

Native MLX inference for GigaAM v3 CTC on Apple Silicon — 139x realtime on M4.

What's included

File	Description
`mlx_convert/gigaam_mlx.py`	Full MLX model: Conformer encoder (16 layers, 768d, RoPE), CTC head, mel spectrogram, streaming
`mlx_convert/convert_gigaam_to_mlx.py`	PyTorch → MLX conversion (safetensors + config.json)
`mlx_convert/gigaam-cli`	Single-file transcription CLI
`mlx_convert/gigaam-stream`	Real-time streaming (live mic + file)
`mlx_convert/gigaam-transcribe`	Shell wrapper
`mlx_convert/README.md`	Documentation: Python API, CLI, benchmarks, mlx-audio integration

Architecture

Audio (16kHz) → Log-Mel (64 bins) → Conv1d Subsampling (4x)
  → 16× Conformer (RoPE MHSA + GLU Conv + SiLU FFN)
  → CTC Head → Greedy Decode

Key details:

RoPE applied before Q/K/V projections (non-standard, matching original)
Mel filterbank saved from PyTorch (exact match, no recomputation drift)
All Conv1d weights transposed: [out, in, K] → [out, K, in] for MLX

Performance (Apple M4)

Metric	Value
Batch (11s audio)	81ms (139x realtime)
Streaming (1s step)	57ms/step
Model size (fp16)	421 MB

Python API

from gigaam_mlx import load_model, load_audio

model = load_model("./gigaam-v3-ctc-mlx")
text = model.transcribe(load_audio("audio.wav"))

# Streaming
for r in model.stream_generate(load_audio("audio.wav")):
    print(r.cumulative_text)

mlx-audio compatibility

StreamingResult follows the same contract as mlx-audio Parakeet/Whisper streaming. README includes integration guide for adding GigaAM as an mlx-audio STT model.

Testing

Tested on Apple M4 with various Russian speech samples. Output matches PyTorch reference (character-level exact match on short utterances, minor CTC boundary differences on longer audio).

…rnnt GigaAM-RNNT & long-form inference

fix links

* add arxiv paper

…pers#30)

Fix pyannote model loading (conflicts with new torch) and workflow disk memory

…al_cache No need to repass HF_TOKEN, explicit check for a local copy

Add native MLX (Apple Silicon) inference for GigaAM v3 CTC model: - Full Conformer encoder (16 layers, 768d) with RoPE attention - Conv1d subsampling, GLU convolution module, SiLU FFN - CTC greedy decoding with proper blank/repeat collapsing - Log-mel spectrogram computed in MLX (exact match to PyTorch) - PyTorch → MLX weight conversion script (safetensors + config.json) - Streaming transcription (growing buffer, live mic + file) - CLI tools: gigaam-cli, gigaam-stream, gigaam-transcribe - Python API: load_model, transcribe, stream_generate, stream_live - mlx-audio ecosystem compatible (StreamingResult contract) Performance on Apple M4: - 139x realtime (11s audio in 81ms) - 57ms/step streaming latency - fp16 weights: 421 MB New files in mlx_convert/: gigaam_mlx.py — MLX model + inference + streaming convert_gigaam_to_mlx.py — PyTorch → MLX conversion gigaam-cli — single-file transcription CLI gigaam-stream — real-time streaming CLI gigaam-transcribe — shell wrapper README.md — documentation, API, benchmarks

- Implement (Embedding + LSTM) matching PyTorch layout - Implement (Linear layers + ReLU) - Abstract and to automatically use RNNT or CTC based on config - Update conversion script to map PyTorch's separate gate weights/biases to MLX's grouped format - Update GigaAMConfig to parse RNNT settings - Update README with RNNT benchmarks (48x realtime on M4 vs 139x for CTC)

…rking Add complete MLX conversion pipeline and inference tools for GigaAM v3 RNNT model targeting Apple Silicon: - Add pre-converted GigaAM v3 RNNT MLX model artifacts (config, weights in safetensors format) with comprehensive documentation of architecture and performance metrics (48× realtime on M4) - Add weight inspection utility to analyze PyTorch checkpoint structure and enable accurate parameter mapping during conversion - Add comprehensive test suite covering MLX model inference, fp32 variant testing, and PyTorch baseline comparison - Add comparative benchmarking tool (compare_all.py) for side-by-side evaluation of Whisper CPP, GigaAM PyTorch (CPU), and GigaAM MLX implementations - Add RNNT architecture patch script to support joint network and LSTM decoder components - Add dependency lock file (uv.lock) for reproducible environment management This enables efficient speech recognition on Apple Silicon with ~9% lower WER compared to CTC variant through autoregressive joint language modeling.

misteral · 2026-03-15T12:22:30Z

Superseded by #62 — clean rebased version with only MLX-related changes.

georgygospodinov and others added 20 commits May 30, 2024 18:31

Merge pull request salute-developers#5 from salute-developers/gigaam_…

2bb9af0

…rnnt GigaAM-RNNT & long-form inference

GigaAM-v2: MIT license, updated weights, ONNX

fad589b

fix links

1d11472

Merge pull request salute-developers#26 from salute-developers/links

d0aedd4

fix links

arxiv paper (salute-developers#40)

c24cbaf

* add arxiv paper

fix: bug in decoding.py producing empty transcriptions (salute-develo…

13a2b2e

…pers#30)

GigaAM-v3 (salute-developers#47): end-to-end models, new domains

129e5f4

fix: hf link

44aa872

add habr link

1193373

Fx: pyannote model loading

20d0789

Fx: pyannote checkpoint loading

f182958

Fix pyannote model loading (conflicts with new torch) and workflow disk memory

Fx: hf model caching

410802e

Merge pull request salute-developers#54 from salute-developers/hf_loc…

1f40997

…al_cache No need to repass HF_TOKEN, explicit check for a local copy

Mv to pyproject.toml, rm setup & reqs

fb13e84

Add quotes

8eaf3c4

Move to pyproject.toml, rm setup & reqs

0456fd7

chore: add gigaam-v3-rnnt-mlx to .gitignore

63e4ad0

misteral force-pushed the feat/mlx-apple-silicon branch from 6128bf3 to 63e4ad0 Compare March 15, 2026 12:13

misteral mentioned this pull request Mar 15, 2026

feat: add MLX inference for Apple Silicon (CTC + RNNT) #62

Open

misteral closed this Mar 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GigaAM v3 CTC inference on Apple Silicon via MLX#61

feat: GigaAM v3 CTC inference on Apple Silicon via MLX#61
misteral wants to merge 20 commits intosalute-developers:mainfrom
misteral:feat/mlx-apple-silicon

misteral commented Mar 13, 2026

Uh oh!

misteral commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

misteral commented Mar 13, 2026

Summary

What's included

Architecture

Performance (Apple M4)

Python API

mlx-audio compatibility

Testing

Uh oh!

misteral commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants