Inference Serving Best Practice for Zipformer Transducer ASR

I provide a template triton config for zipformer model and a client api to evaluate the performance of serving pipline.

Prepare Environment

Build the server docker image:

cd triton
docker build . -f Dockerfile/Dockerfile.server -t sherpa_triton_server:latest --network host

Start the docker container:

docker run --gpus all -rm -v $PWD:/workspace/sherpa --name sherpa_server --net host --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -it sherpa_triton_server

Now, you should enter into the container successfully. Inside the container,run server:

bash run_server.sh

Test client

cd triton/client/Triton-ASR-client
bash run_client.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Inference Serving Best Practice for Zipformer Transducer ASR

Prepare Environment

Test client

Files

README.md

Latest commit

History

README.md

File metadata and controls

Inference Serving Best Practice for Zipformer Transducer ASR

Prepare Environment

Test client