feat: GigaAM v3 CTC inference on Apple Silicon via MLX#61
Closed
misteral wants to merge 20 commits intosalute-developers:mainfrom
Closed
feat: GigaAM v3 CTC inference on Apple Silicon via MLX#61misteral wants to merge 20 commits intosalute-developers:mainfrom
misteral wants to merge 20 commits intosalute-developers:mainfrom
Conversation
…rnnt GigaAM-RNNT & long-form inference
* add arxiv paper
Fix pyannote model loading (conflicts with new torch) and workflow disk memory
…al_cache No need to repass HF_TOKEN, explicit check for a local copy
Add native MLX (Apple Silicon) inference for GigaAM v3 CTC model: - Full Conformer encoder (16 layers, 768d) with RoPE attention - Conv1d subsampling, GLU convolution module, SiLU FFN - CTC greedy decoding with proper blank/repeat collapsing - Log-mel spectrogram computed in MLX (exact match to PyTorch) - PyTorch → MLX weight conversion script (safetensors + config.json) - Streaming transcription (growing buffer, live mic + file) - CLI tools: gigaam-cli, gigaam-stream, gigaam-transcribe - Python API: load_model, transcribe, stream_generate, stream_live - mlx-audio ecosystem compatible (StreamingResult contract) Performance on Apple M4: - 139x realtime (11s audio in 81ms) - 57ms/step streaming latency - fp16 weights: 421 MB New files in mlx_convert/: gigaam_mlx.py — MLX model + inference + streaming convert_gigaam_to_mlx.py — PyTorch → MLX conversion gigaam-cli — single-file transcription CLI gigaam-stream — real-time streaming CLI gigaam-transcribe — shell wrapper README.md — documentation, API, benchmarks
- Implement (Embedding + LSTM) matching PyTorch layout - Implement (Linear layers + ReLU) - Abstract and to automatically use RNNT or CTC based on config - Update conversion script to map PyTorch's separate gate weights/biases to MLX's grouped format - Update GigaAMConfig to parse RNNT settings - Update README with RNNT benchmarks (48x realtime on M4 vs 139x for CTC)
…rking Add complete MLX conversion pipeline and inference tools for GigaAM v3 RNNT model targeting Apple Silicon: - Add pre-converted GigaAM v3 RNNT MLX model artifacts (config, weights in safetensors format) with comprehensive documentation of architecture and performance metrics (48× realtime on M4) - Add weight inspection utility to analyze PyTorch checkpoint structure and enable accurate parameter mapping during conversion - Add comprehensive test suite covering MLX model inference, fp32 variant testing, and PyTorch baseline comparison - Add comparative benchmarking tool (compare_all.py) for side-by-side evaluation of Whisper CPP, GigaAM PyTorch (CPU), and GigaAM MLX implementations - Add RNNT architecture patch script to support joint network and LSTM decoder components - Add dependency lock file (uv.lock) for reproducible environment management This enables efficient speech recognition on Apple Silicon with ~9% lower WER compared to CTC variant through autoregressive joint language modeling.
6128bf3 to
63e4ad0
Compare
Author
|
Superseded by #62 — clean rebased version with only MLX-related changes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Native MLX inference for GigaAM v3 CTC on Apple Silicon — 139x realtime on M4.
What's included
mlx_convert/gigaam_mlx.pymlx_convert/convert_gigaam_to_mlx.pymlx_convert/gigaam-climlx_convert/gigaam-streammlx_convert/gigaam-transcribemlx_convert/README.mdArchitecture
Key details:
[out, in, K]→[out, K, in]for MLXPerformance (Apple M4)
Python API
mlx-audio compatibility
StreamingResultfollows the same contract as mlx-audio Parakeet/Whisper streaming. README includes integration guide for adding GigaAM as an mlx-audio STT model.Testing
Tested on Apple M4 with various Russian speech samples. Output matches PyTorch reference (character-level exact match on short utterances, minor CTC boundary differences on longer audio).