feat: add MLX inference for Apple Silicon (CTC + RNNT)#62
Open
misteral wants to merge 1 commit intosalute-developers:mainfrom
Open
feat: add MLX inference for Apple Silicon (CTC + RNNT)#62misteral wants to merge 1 commit intosalute-developers:mainfrom
misteral wants to merge 1 commit intosalute-developers:mainfrom
Conversation
Add native MLX inference for GigaAM v3 on Apple Silicon: - Full Conformer encoder (16 layers, 768d) with RoPE attention - CTC and RNNT heads with greedy decoding - PyTorch → MLX weight conversion (safetensors + config.json) - Streaming transcription (growing buffer, live mic + file) - CLI tools: gigaam-cli, gigaam-stream, gigaam-transcribe - Python API: load_model, transcribe, stream_generate, stream_live Performance on Apple M4: - CTC: 139x realtime (11s audio in 81ms), fp16 421 MB - RNNT: 48x realtime (11s in 230ms), ~9% lower WER than CTC New files in mlx_convert/: gigaam_mlx.py — MLX model + inference + streaming convert_gigaam_to_mlx.py — PyTorch → MLX conversion gigaam-cli — single-file transcription CLI gigaam-stream — real-time streaming CLI gigaam-transcribe — shell wrapper README.md — documentation, API, benchmarks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Native MLX inference for GigaAM v3 on Apple Silicon — supports both CTC (139× realtime) and RNNT (48× realtime, ~9% lower WER) models.
Supersedes #61 (cleaned up: rebased on main, removed dev artifacts).
What's included
mlx_convert/gigaam_mlx.pymlx_convert/convert_gigaam_to_mlx.pymlx_convert/gigaam-climlx_convert/gigaam-streammlx_convert/gigaam-transcribemlx_convert/README.mdArchitecture
Key details:
[out, in, K]→[out, K, in]for MLXPerformance (Apple M4)
Python API
Testing
Tested on Apple M4 with various Russian speech samples. Output matches PyTorch reference (character-level exact match on short utterances).