Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python streaming (microphone) recognition example #17

Open
anatol-grabowski opened this issue Jul 1, 2023 · 4 comments
Open

Python streaming (microphone) recognition example #17

anatol-grabowski opened this issue Jul 1, 2023 · 4 comments

Comments

@anatol-grabowski
Copy link

Hey, nice job on both this project and Live Captions!

Not clear though how to go about streaming recognition in python. Or if it is even possilble. Example needed.

@anatol-grabowski
Copy link
Author

Here is what I've tried:

from typing import List
import sys
import april_asr as april
import queue
import sounddevice as sd

def asr_cb(result_type: april.Result, tokens: List[april.Token]):
    """Simple handler that concatenates all tokens and prints it"""
    prefix = "."
    if result_type == april.Result.FINAL_RECOGNITION:
        prefix = "@"
    elif result_type == april.Result.PARTIAL_RECOGNITION:
        prefix = "-"

    string = ""
    for token in tokens:
        string += token.token

    print(f"{prefix}{string}")

model = april.Model(sys.argv[1])
print("Name: " + model.get_name())
print("Description: " + model.get_description())
print("Language: " + model.get_language())
session = april.Session(model, asr_cb, asynchronous=True)

def audio_cb(indata, frames, time, status):
    session.feed_pcm16(bytes(indata))

def run(device: int) -> None:
    with sd.RawInputStream(samplerate=16000, blocksize = 8000, device=device, dtype='int16', channels=1, callback=audio_cb):
        while True:
            pass

    session.feed_pcm16(data)
    session.flush()

def main():
    args = sys.argv
    if len(args) != 3:
        print("Usage: " + args[0] + " /path/to/model.april 5 # 5 - sound device number")
    else:
        run(args[1], int(args[2]))

if __name__ == "__main__":
    main()

But my understing of what I'm doing is quite basic. Unsuprisingly no luck so far:

> pipenv run python main.py /home/anatoly/Downloads/april-english-dev-01110_en.april 5
Name: April English Dev-01110
Description: Punctuation + Numbers 23a3
Language: en
libapril: (/home/runner/work/april-asr/april-asr/src/proc_thread.c:54) [WARNING] Failed to initialize cnd_t
libapril: (/home/runner/work/april-asr/april-asr/src/proc_thread.c:76) [ERROR] Failed to lock mutex in pt_raise!
libapril: (/home/runner/work/april-asr/april-asr/src/proc_thread.c:82) [ERROR] Failed to unlock mutex in pt_raise!

@abb128
Copy link
Owner

abb128 commented Jul 1, 2023

Can you try the latest python build from https://github.com/abb128/april-asr/actions/runs/5105158229 and let me know if it still happens?

@anatol-grabowski
Copy link
Author

I have found out that the asynchronous flag was not necessary for my purposes.
Checked, works with asynchronous=True flag with "april_asr-0.0.3-py3-none-manylinux_2_31_x86_64.whl" from the link above.
Uninstalled april-asr and reinstalled from pypi - doesn't work with asynchronous=True flag (errors above).

So, the issue doesn't happen withe latest build from the link above.

@anatol-grabowski
Copy link
Author

Here is the updated example code that works with the microphone:

from typing import List
import sys
import april_asr as april
import sounddevice as sd
import numpy as np
import time

def asr_cb(result_type: april.Result, tokens: List[april.Token]):
    prefix = "."
    if result_type == april.Result.FINAL_RECOGNITION:
        prefix = "@"
    elif result_type == april.Result.PARTIAL_RECOGNITION:
        prefix = "-"

    string = ""
    for token in tokens:
        string += token.token

    print(f"{prefix}{string}")

model = april.Model('/home/anatoly/Downloads/april-english-dev-01110_en.april')
print("Name: " + model.get_name())
print("Description: " + model.get_description())
print("Language: " + model.get_language())
session = april.Session(model, asr_cb)


duration = 10  # seconds
samplerate = 16000 # samples/second
channels = 1
shape = (int(samplerate * duration), channels)
dtype = np.int16

def audio_callback(indata, frames, times, status):
    session.feed_pcm16(bytes(indata))

stream = sd.InputStream(samplerate=samplerate, channels=channels, dtype=dtype, callback=audio_callback)
stream.start()
sd.sleep(duration * 1000)
stream.stop()
stream.close()
session.flush()

# play the recorded audio to make sure that it is being recorded correctly
sd.play(buff, samplerate=samplerate)
sd.wait()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants