Add Moonshine to KerasHub #2093

harshaljanjani · 2025-02-12T04:53:16Z

Moonshine ASR Model Implementation in Keras

This PR introduces the Moonshine Automatic Speech Recognition (ASR) model into the Keras ecosystem. The Moonshine model, originally developed by UsefulSensors and available via Hugging Face, is a transformer-based architecture designed to transcribe audio inputs into text. This implementation ports the model into Keras, complete with support for pre-trained weights from Hugging Face.

Overview

The Moonshine ASR model employs an encoder-decoder architecture. The encoder processes audio features, while the decoder generates text transcriptions. This implementation includes custom layers and components to mirror the original model's behavior, validated against the Hugging Face version for accuracy.

Files Added

The following files have been added to implement the Moonshine ASR model:

moonshine_backbone.py defines the MoonshineBackbone class, the core of the model. It integrates the encoder and decoder blocks, embeddings, and layer normalization, forming the complete encoder-decoder pipeline.
moonshine_decoder.py contains the MoonshineDecoderBlock class, a decoder block with self-attention (causal), cross-attention, and feedforward layers. It supports caching for efficient generation and uses SwiGLU activation by default.
moonshine_encoder.py implements the MoonshineEncoderBlock class, the encoder component with self-attention and feedforward layers. It optionally uses SwiGLU activation, matching the original model's configuration.
moonshine_multi_head_attention.py provides a custom multi-head attention layer, the MoonshineMultiHeadAttention class.
moonshine_layers.py includes utility layers, which are:
- MoonshineRotaryEmbedding: Rotary positional embeddings with dynamic scaling support.
- MoonshineMLP: Can be configured to use SwiGLU activation for feedforward networks or as a linear layer with GeLU activation.
moonshine_audio_converter.py implements the MoonshineAudioConverter class, a specialized audio preprocessing layer that converts raw audio waveforms into feature representations suitable for the Moonshine ASR model. It includes downsampling and feature extraction, normalization, and handling of attention masks.
moonshine_tokenizer.py provides the MoonshineTokenizer class, which extends the LlamaTokenizer to handle text tokenization for the Moonshine model. It incorporates Moonshine-specific special tokens, including position embedding tokens, hex tokens, and empty tokens, and manages the conversion between raw text and token IDs.
moonshine_audio_to_text.py implements the MoonshineAudioToText class, a task model that extends the Seq2SeqLM base class. This class integrates the audio converter, backbone, and tokenizer components to create a complete end-to-end ASR pipeline. It includes methods for text generation from audio inputs, with support for customizable generation parameters and built-in trimming of output sequences.
moonshine_seq_2_seq_lm_preprocessor.py implements the MoonshineSeq2SeqLMPreprocessor class, which extends the Seq2SeqLMPreprocessor base class. It handles the conversion of raw audio inputs and text into a format suitable for MoonshineAudioToText. The preprocessor supports both training mode (with paired audio-text inputs) and generation mode (with audio inputs only), including methods for preprocessing and postprocessing during text generation.
Weights Conversion Script
- Converts pre-trained weights from Hugging Face into a Keras-compatible format.
- Loads them into the MoonshineBackbone model.
- Validates the Keras implementation by comparing outputs with the Hugging Face model using random inputs.

Dependencies

Keras 3: Required for backend-agnostic operations.
Hugging Face Transformers: Needed by the weights conversion script for loading the original model.
Librosa: Required for audio processing.

Notes for Reviewers

The implementation is fully functional with pre-trained weights and ready for immediate use.
The modular design allows for easy extension or modification of individual components (e.g., attention layers or embeddings).
All custom layers are serializable with get_config() and registered with @keras.saving.register_keras_serializable, ensuring compatibility with Keras model saving/loading.
End-To-End Demo Notebook: Colab Notebook.
Functionality Tests Notebook Independent From HF: Colab Notebook.

Closes issue #2083.

divyashreepathihalli

Thank you for the PR! I left some initial comments.
I would suggest following the format, structure and naming conventions similar to teh Whisper model here - https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/models/whisper

add docstrings
convert backbone to a functional model
add a moonshine_audio_converter.py
Add a numerics verification colab to verify the implementation

keras_hub/src/models/moonshine/moonshine_backbone.py

keras_hub/src/models/moonshine/moonshine_backbone_test.py

keras_hub/src/models/moonshine/moonshine_utils.py

keras_hub/src/models/moonshine/moonshine_encoder.py

harshaljanjani · 2025-02-12T18:31:10Z

Will make the changes at the earliest, thanks for the review!

keras_hub/src/models/moonshine/moonshine_layers.py

divyashreepathihalli · 2025-02-18T03:11:53Z

you will need to run shell/api_gen.sh and also shell/format.sh at root to resolve the code formatting error

harshaljanjani · 2025-02-18T04:05:31Z

Thanks for the review, made the changes! The issue regarding the build still persists.

harshaljanjani · 2025-02-19T08:22:27Z

Summary of Changes:

Added MoonshineDecoderBlock (passes numeric checks, facing a few issues in the reversible embeddings, which keeps me from integrating the whole decoder, but I'll try to fix that and get back).
Made a testable component for the encoder subclassed from keras.Model separate from the MoonshineBackbone class, as it's easier to test loading weights this way since each of the preprocessor, decoder and encoder has separate weight files.

harshaljanjani · 2025-02-22T10:30:56Z

TODO:

Verify the build methods, as the sanity checks for serialization don’t pass, even though the numerics are aligned.
Write weight conversion scripts.

harshaljanjani · 2025-02-24T18:35:40Z

Status of the PR:
Weight assignment works, but the numerics differ.

Outputs of the convert_moonshine_checkpoints.py script:

MD5 Checksum Comparison
Decoder Weights Assignment
Preprocessor Weights Assignment
Encoder Weights Assignment

keras_hub/src/models/moonshine/moonshine_audio_converter_test.py

keras_hub/src/models/moonshine/moonshine_backbone.py

keras_hub/src/models/moonshine/moonshine_backbone_test.py

keras_hub/src/models/moonshine/moonshine_decoder.py

… the PyTorch backend, integrated into the KerasHub infra!

harshaljanjani · 2025-04-15T08:45:36Z

Updated the Colab notebook with results from the latest commit. The PR is now open for review.
What's New?
The task model has been integrated across all three backends into the KerasHub infra, including the custom caching strategy used by Moonshine.

divyashreepathihalli · 2025-04-15T11:51:48Z

I don't see the demo notebook with the KerasHub model implemented here, I am seeing a demo from the Huggingface model in the colab
please add the demo with KH model - and verify that the outputs match with model.generate

harshaljanjani · 2025-04-15T13:12:09Z

I don't see the demo notebook with the KerasHub model implemented here, I am seeing a demo from the Huggingface model in the colab
please add the demo with KH model - and verify that the outputs match with model.generate

@divyashreepathihalli The outputs you see across the first three cells are the KH model outputs for four test samples for each preset, using the generate() function. I've run tools/checkpoint_conversion/convert_moonshine_checkpoints.py in each of the cells across the three backends, which both, verifies the numerics, and contains the end-to-end example.

The cell links are:

You may also review the checkpoint conversion file to verify the same.

The HF model is only used in the last cell, where I point out a bug in the HF implementation and show how for the same sample, the KH model presets give good transcripts across all three backends. (The sample used in this test is the "Female Clear Voice (Maximum Length - 64 Sec)" one.)

harshaljanjani · 2025-04-19T11:59:13Z

@mattdangerw / @abheesht17 / @divyashreepathihalli Whenever you have a chance, could you please take a look at this PR and the notebook, thanks!

The rope_scaling parameter was much more of a direct port from HF, in which it took a dict and pulled the type key from it. The Moonshine presets nowhere explicitly use the dynamic mode, and it isn't crucial to the model. If it is necessary in the future, sure, but for a seminal port, I think it's best to keep it out. It's best to inherit from the KH RotaryEmbedding class and leave the scaling_factor arg upto it instead, works perfectly fine as a replacement and is much more integrated into the existing infra.

… with pre-commit hooks

mattdangerw · 2025-04-28T21:58:30Z

Dropping a few comments. I think we need still need to get the generation here working similar to other models, make the preprocessing be actual preprocessing (no weights!). I still think a clearer high level colab with intended usage might help clarify things.

Do some weight conversion, upload to huggingface or kaggle (doesn't matter which) on your own user.
Make a colab that does not touch huggingface at all that shows the intended usage here.
Try to show some of the usages here Add Moonshine to KerasHub #2093 (comment)

How much of this is working today? Have we tried running fine-tuning? That will run preprocessing via a tf.data.Dataset map, does that work?

!pip install git+https://github.com/harshaljanjani/keras-hub@moonshine

import os
os.environ["KERAS_BACKEND"] = "jax"  # Or "tensorflow" or "torch" with zero other changes.

import keras
import keras_hub

audio_to_text = keras_hub.models.AudioToText.from_preset(
    "hf://harshaljanjani/keras-moonshine",
)

audio_to_text.generate(audio_tensor)
audio_to_text.generate(audio_batch)

audio_to_text.compile(sampler="top_k")
audio_to_text.generate(audio_tensor)

audio_to_text.compile(...)
audio_to_text.enable_lora(4)  # Optional.
audio_to_text.fit(audio_dataset)
autio_to_text.generate(audio_batch)

keras_hub/src/models/moonshine/moonshine_audio_converter.py

keras_hub/src/models/moonshine/moonshine_audio_to_text.py

keras_hub/src/models/moonshine/moonshine_backbone.py

keras_hub/src/models/moonshine/moonshine_layers.py

keras_hub/src/tests/test_data/audio_transcription_tests/female_long_voice_clip_64sec.wav

keras_hub/src/models/moonshine/moonshine_multi_head_attention.py

keras_hub/src/models/moonshine/moonshine_audio_to_text_test.py

keras_hub/src/models/moonshine/moonshine_audio_to_text.py

harshaljanjani · 2025-04-29T02:25:52Z

Will check the comments out, thanks for the review @mattdangerw. I left a few replies, I'd love to hear your opinion on a few non-trivial things as mentioned in the replies; I'll proceed to make changes on the others.

How much of this is working today? Have we tried running fine-tuning? That will run preprocessing via a tf.data.Dataset map, does that work?

I haven't tested fine-tuning yet, but I'll see what I can do. Since you mentioned that the change in the generate() strategy was key, I focused on it for this round.

- MoonshineAudioConverter now has no trainable weights, all feature extraction is moved to the MoonshineBackbone - Removed logits() function and used self.token_embedding(reverse=True) instead - Resolved test_causal_lm_basics() for all backends, thus resolving tf.data.Dataset.map compatibility issues on JAX and Torch backends. - Removed 64 second test file.

harshaljanjani · 2025-04-30T16:22:40Z

Addressed reviews - (JIT compile + dynamic shapes issue). Looking forward to guidance regarding the same, I'll try to see if I can solve it in the mean time.

Fixed JIT compile issues on TensorFlow and JAX without unnecessary shenanigans Reverted to KerasNLP style of caching without stateful cache modes.

harshaljanjani · 2025-05-02T16:19:14Z

The PR should be ready for the next round of reviews @mattdangerw. Here's the new Colab you mentioned. I've tested the functionality with dummy inputs for now; hope you don't mind! I'll check the weights upload thing and the presets once the design is approved.

divyashreepathihalli · 2025-05-02T20:25:05Z

The PR should be ready for the next round of reviews @mattdangerw. Here's the new Colab you mentioned. I've tested the functionality with dummy inputs for now; hope you don't mind! I'll check the weights upload thing and the presets once the design is approved.

Functionality Tests Notebook Independent From HF.

Same Outputs Notebook, Updated To The Current PR's Version.

please add demo colabs, verifications etc to PR descriptions so that it is easier to find

harshaljanjani · 2025-05-02T20:28:14Z

please add demo colabs, verifications etc to PR descriptions so that it is easier to find

Apologies, the end-to-end demo notebook has been linked in the PR description from the beginning. I've just linked the functionality tests I added today in the PR description!

…ention

…ing/loading

harshaljanjani and others added 8 commits February 10, 2025 21:10

init: Add MoonshineBackbone files

8037ed0

feat: Make backbone test suite more robust

51a40b8

feat: Exactness to the original and robustness of test cases

098781e

fix: Support stacked encoder layers from original implementation

047de1f

TODO: Fix layer names

885f77f

fix: Add __init__ file

805a806

Merge branch 'master' into moonshine

9f579c0

fix: Correct subclassing and make ops more robust

aebeac7

divyashreepathihalli requested changes Feb 12, 2025

View reviewed changes

feat: Incorporate feedback for Moonshine

60112d5

harshaljanjani requested a review from divyashreepathihalli February 16, 2025 15:37

divyashreepathihalli reviewed Feb 17, 2025

View reviewed changes

keras_hub/src/models/moonshine/moonshine_layers.py Outdated Show resolved Hide resolved

refactor: Move super.build() calls to the beginning of build() functions

10cff1e

divyashreepathihalli requested a review from JyotinderSingh February 18, 2025 03:04

fix: Resolve API issue and fix duplicate parameters in attention

8dac22f

harshaljanjani self-assigned this Feb 19, 2025

init: Add MoonshineDecoderBlock files (TODO: MoonshineDecoder)

2bacaf2

harshaljanjani added 4 commits February 20, 2025 10:58

feat: Add MoonshineDecoder with questionable tolerance

3af8498

fix: Fix decoder numerics (TODO: serialization and tokenizer)

2a2fcb9

feat: Add Tokenizer and SentencePiece model files

e05d1ed

refactor: API modification and temporarily removed TestCase

9130d2c

chore: Update HF params (TODO: Resolve numerics issue)

b4e1ae9

harshaljanjani requested a review from divyashreepathihalli February 24, 2025 18:40

divyashreepathihalli reviewed Feb 24, 2025

View reviewed changes

harshaljanjani marked this pull request as ready for review April 12, 2025 12:25

hooraayyy: The tests are yet to be fixed, but the task model works on…

4f53d78

… the PyTorch backend, integrated into the KerasHub infra!

sachinprasadhs removed the WIP Pull requests which are work in progress and not ready yet for review. label Apr 14, 2025

harshaljanjani added 2 commits April 14, 2025 16:01

TODO: Fix JAX backend

17ec26e

end of sprint: Complete JAX backend implementation

c59607d

outdated docstring: Update call_decoder_with_cache() docstring

1a50443

harshaljanjani and others added 5 commits April 21, 2025 14:15

chore: sweeping up the pixie dust, literally nothing new here

c8b9f41

Merge branch 'master' into moonshine

fb88c98

ughh: Messed up the merge probably

dc6c7c0

chore: Unused params from debugging removed, check out new api_gen.py…

e872fde

… with pre-commit hooks

mattdangerw reviewed Apr 28, 2025

View reviewed changes

keras_hub/src/models/moonshine/moonshine_audio_to_text.py Outdated Show resolved Hide resolved

keras_hub/src/models/moonshine/moonshine_audio_to_text.py Outdated Show resolved Hide resolved

sachinprasadhs added the stat:awaiting keras-eng label May 1, 2025

JyotinderSingh self-requested a review May 2, 2025 03:44

hoorayyy: All tests passing across all backends

6727a3d

Fixed JIT compile issues on TensorFlow and JAX without unnecessary shenanigans Reverted to KerasNLP style of caching without stateful cache modes.

harshaljanjani requested a review from mattdangerw May 2, 2025 16:19

harshaljanjani added 2 commits May 3, 2025 09:30

docstring nit: Remove cache_mode parameter from MoonshineMultiHeadAtt…

41b9c17

…ention

checkpoint conv: Load the model from the preset instead to verify sav…

8fc2460

…ing/loading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Moonshine to KerasHub #2093

Add Moonshine to KerasHub #2093

harshaljanjani commented Feb 12, 2025 •

edited

Loading

divyashreepathihalli left a comment

harshaljanjani commented Feb 12, 2025

divyashreepathihalli commented Feb 18, 2025

harshaljanjani commented Feb 18, 2025

harshaljanjani commented Feb 19, 2025

harshaljanjani commented Feb 22, 2025

harshaljanjani commented Feb 24, 2025

harshaljanjani commented Apr 15, 2025

divyashreepathihalli commented Apr 15, 2025 •

edited

Loading

harshaljanjani commented Apr 15, 2025

harshaljanjani commented Apr 19, 2025 •

edited

Loading

mattdangerw commented Apr 28, 2025

harshaljanjani commented Apr 29, 2025 •

edited

Loading

harshaljanjani commented Apr 30, 2025

harshaljanjani commented May 2, 2025

divyashreepathihalli commented May 2, 2025

harshaljanjani commented May 2, 2025 •

edited

Loading

Add Moonshine to KerasHub #2093

Are you sure you want to change the base?

Add Moonshine to KerasHub #2093

Conversation

harshaljanjani commented Feb 12, 2025 • edited Loading

Moonshine ASR Model Implementation in Keras

Overview

Files Added

Dependencies

Notes for Reviewers

divyashreepathihalli left a comment

Choose a reason for hiding this comment

harshaljanjani commented Feb 12, 2025

divyashreepathihalli commented Feb 18, 2025

harshaljanjani commented Feb 18, 2025

harshaljanjani commented Feb 19, 2025

harshaljanjani commented Feb 22, 2025

harshaljanjani commented Feb 24, 2025

harshaljanjani commented Apr 15, 2025

divyashreepathihalli commented Apr 15, 2025 • edited Loading

harshaljanjani commented Apr 15, 2025

harshaljanjani commented Apr 19, 2025 • edited Loading

mattdangerw commented Apr 28, 2025

harshaljanjani commented Apr 29, 2025 • edited Loading

harshaljanjani commented Apr 30, 2025

harshaljanjani commented May 2, 2025

divyashreepathihalli commented May 2, 2025

harshaljanjani commented May 2, 2025 • edited Loading

harshaljanjani commented Feb 12, 2025 •

edited

Loading

divyashreepathihalli commented Apr 15, 2025 •

edited

Loading

harshaljanjani commented Apr 19, 2025 •

edited

Loading

harshaljanjani commented Apr 29, 2025 •

edited

Loading

harshaljanjani commented May 2, 2025 •

edited

Loading