Skip to content
/ LIRA Public

This tool helps you easily deploy ASR models on NPUs on AMD's Ryzen AI 300 series laptops

License

Notifications You must be signed in to change notification settings

amd/LIRA

Repository files navigation

LIRA: Local Inference tool for Realtime Audio

LIRA logo

Local, efficient speech recognition.
Run ASR models on your machine—fast, simple, and developer-friendly.

LIRA is a CLI-first, developer-friendly tool: run and serve ASR models locally with lira run and lira serve to integrate with your apps and tools.


🧩 Supported Model Architectures & Runtimes

LIRA supports multiple speech-model architectures. Runtime support depends on the exported model and chosen runtime.

Model Typical use case Runs on Supported datatypes
whisper-base Low-latency, resource-constrained CPU, GPU, NPU FP32, BFP16
whisper-small Balanced accuracy and performance CPU, GPU, NPU FP32, BFP16
whisper-medium Higher accuracy for challenging audio CPU, GPU, NPU FP32, BFP16
whisper-large-v3-turbo Highest accuracy (more compute) CPU, GPU, NPU FP32, BFP16
Zipformer Streaming / low-latency ASR encoder CPU, GPU, NPU FP32, BFP16

*NPU support depends on available Vitis AI export artifacts and target hardware.


🚀 Getting Started

Prerequisites:

  • Python 3.10 is required.
  • We recommend using conda for environment management.
  • To use Ryzen AI powered NPUs, you must have a RYzen AI 300 series laptop.
  • For RyzenAI NPU flow, follow the RyzenAI installation instructions and verify drivers/runtime for your device.
  • Requires Ryzen AI 1.6.0 or above

Minimal install steps:

  1. Activate your conda environment:
    If you have followed Ryzen AI setup instructions and installed the latest Ryzen AI, you'll have the latest environment available in conda envs list.

    conda activate <latest-ryzen-ai-environment>

    For example, use the latest

    conda activate ryzen-ai-1.6.0
  2. Clone the repo and change directory:

    git clone https://github.com/aigdat/LIRA.git
    cd LIRA
  3. Install LIRA in editable mode:

    pip install -e .

Now you can run lira --help to see available commands.


⚡ CLI-first Design

LIRA is a CLI-first toolkit focused on simple developer workflows for exporting, running, and serving speech models.

Primary commands:

  • lira run
    Run, export, or benchmark models directly from the command line.
    Use for local inference, ONNX export, or rapid prototyping.

  • lira serve
    Launch a FastAPI server with OpenAI-compatible endpoints.
    Expose models as HTTP APIs for real-time transcription and seamless integration.
    Add speech recognition to your apps, automate workflows, or build custom endpoints using standard REST calls.

Quick examples:

# Run a model locally (inference)
lira run whisper --model-type whisper-base --export --device cpu --audio audio_files/test.wav

# Serve the model for local apps (OpenAI-compatible endpoints)
lira serve --backend openai --model whisper-base --device cpu --host 0.0.0.0 --port 5000

🖥️ LIRA Server

LIRA includes a FastAPI-based HTTP server for rapid integration with your applications. The server offers OpenAI API compatibility for real-time speech recognition.

Start the server:

  • CPU acceleration:

    lira serve --backend openai --model whisper-base --device cpu --host 0.0.0.0 --port 5000
  • NPU acceleration:

    lira serve --backend openai --model whisper-base --device npu --host 0.0.0.0 --port 5000
Test & Debug Your LIRA Server with a Sample Audio File

Open a new command prompt and run the following curl command to verify your server is working. Replace audio_files\test.wav with your actual audio file path if needed:

curl -X POST "http://localhost:5000/v1/audio/transcriptions" ^
        -H "accept: application/json" ^
        -H "Content-Type: multipart/form-data" ^
        -F "file=@audio_files\test.wav" ^
        -F "model=whisper-onnx"

If everything is set up correctly, you should receive a JSON response containing the transcribed text.

Notes:

  • Models are configured in config/model_config.json.
  • For protected backends, set API keys as environment variables.

🌐 LIRA Server Demo: Try Open WebUI with LIRA

Interested in more? Try the LIRA server demo with Open WebUI for interactive transcription, monitoring, and management.

👉 Get Started: Step-by-step Setup Guide


🗣️ Running Models with lira run

To run a model using the CLI:

lira run <model> [options]

Replace <model> with the model name or path.

Core flags:

  • --device — target device (cpu, gpu, npu)
  • -m / --model — path to local exported model directory
  • --audio — path to input audio file (wav)
  • --profile — enable timing/profiling output

Tip: run lira run <model> --help for model-specific flags.


Running Whisper Locally using lira run whisper

To export, optimize, compile, and run Whisper models locally, follow this section for examples.

Check out lira run whisper --help for more details on running Whisper and Whisper-specific CLI flags.

Key Whisper flags:

  • --model-type — Hugging Face model ID

  • --export — export/prepare Whisper model to ONNX

  • --export-dir — output path for export

  • --force — overwrite existing export

  • --use-kv-cache — enable KV-cache decoding

  • --static — request static shapes during export

  • --opset — ONNX opset version Examples:

  • Run whisper-base on CPU (auto-download, export, and run):

    # Export Whisper base model to ONNX, optimize, and run on CPU with KV caching enabled
    lira run whisper --model-type whisper-base --export --device cpu --audio audio_files/test.wav --use-kv-cache
  • Run Whisper on NPU (encoder and decoder on NPU):

    # Export Whisper base model to ONNX, optimize, and run on NPU
    lira run whisper --model-type whisper-base --export --device npu --audio <input.wav file>

    Note:
    KV caching is not supported on NPU. If you use --use-kv-cache with --device npu, only the encoder runs on NPU; the decoder runs on CPU.

  • Run a locally exported Whisper ONNX model:

    lira run whisper -m <path to local Whisper ONNX> --device cpu --audio "audio_files/test.wav"

🔄 Running Zipformer using lira run zipformer

Zipformer enables streaming, low-latency transcription.

Run lira run zipformer -h for more details on runnign Zipformer.

Example: Using Zipformer English model exported in AMD Huggingface

lira run zipformer -m aigdat/AMD-zipformer-en  --device cpu --audio "audio_files/test.wav"

⚙️ Configuration

Model and runtime configs live in config/:

  • config/model_config.json — model routing and defaults
  • vitisai_config_*.json — Vitis AI configs for NPU exports

You can point to custom config files or modify those in the repo.


🧪 Early Access & Open Source Intentions

LIRA is released as an open, community-driven project.

  • Some features and model exports are early-access — may be experimental or optimized for internal hardware.
  • APIs may evolve; breaking changes can happen while stabilizing.
  • Community feedback, issues, and PRs are encouraged!

If you rely on specific exported models or NPU backends for production, treat early-access pieces with caution and verify performance.


🤝 Contributing

We welcome contributors!

  • Open issues for bugs, UX problems, or feature requests
  • Submit PRs with tests and clear descriptions
  • Add new model integrations or tooling chains
  • Improve documentation and examples

Guidelines:

  • Follow code style and add unit tests where practical
  • Make small, focused PRs with clear motivation
  • Sign the Contributor License Agreement (CLA) if requested

See CONTRIBUTING.md for more details.


🗺️ Roadmap

Planned improvements:

  • Stabilize CLI flags and server API
  • Add more model backends and runtime targets
  • Improve documentation and quickstart guides
  • CI for model export verification and unit tests
  • UX improvements: error messages, progress reporting, model comparison tools

Contributions aligned with these goals are highly welcome.


💡 Getting Help

  • Open an issue on GitHub with reproduction steps and logs
  • For design/roadmap, open a feature request or contact maintainers via GitHub discussions

Thank you for trying LIRA — we look forward to your contributions and feedback!


📄 License

This project is licensed under the terms of the MIT license.
See the LICENSE file for details.

Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved.

About

This tool helps you easily deploy ASR models on NPUs on AMD's Ryzen AI 300 series laptops

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages