Dynamic Batching Engine for Deep Learning Serving. A tool that implements dynamic batching with batch size and latency factors.
This tool is currently a proof of concept. Refer to MOSEC for production usage.
- Dynamic batching with control over batch size and latency
- Prevents invalid requests from affecting others in the same batch
- Communicates with workers through Unix domain socket or TCP
- Supports load balancing
Click here to read more about the design concept.
go run service/app.go --helpUsage app:
-address string
socket file or host:port (default "batch.socket")
-batch int
max batch size (default 32)
-capacity int
max jobs in the queue (default 1024)
-host string
host address (default "0.0.0.0")
-latency int
max latency (millisecond) (default 10)
-port int
service port (default 8080)
-protocol string
unix or tcp (default "unix")
-timeout int
timeout for a job (millisecond) (default 5000)
go run service/app.go
python examples/app.py
python examples/client.py