Skip to content

Latest commit

 

History

History
23 lines (21 loc) · 757 Bytes

README.md

File metadata and controls

23 lines (21 loc) · 757 Bytes

Inference Serving Best Practice for Zipformer Transducer ASR

I provide a template triton config for zipformer model and a client api to evaluate the performance of serving pipline.

Prepare Environment

Build the server docker image:

cd triton
docker build . -f Dockerfile/Dockerfile.server -t sherpa_triton_server:latest --network host

Start the docker container:

docker run --gpus all -rm -v $PWD:/workspace/sherpa --name sherpa_server --net host --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -it sherpa_triton_server

Now, you should enter into the container successfully. Inside the container,run server:

bash run_server.sh

Test client

cd triton/client/Triton-ASR-client
bash run_client.sh