Skip to content

Commit

Permalink
Merge branch 'dev-swarm' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
antoinecharbel-inmind committed Sep 25, 2020
2 parents 6f2fecd + 9fbd0c3 commit 20789df
Show file tree
Hide file tree
Showing 7 changed files with 197 additions and 1 deletion.
160 changes: 160 additions & 0 deletions README-docker_swarm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# YOLOv3 Darknet GPU Inference API with docker swarm

Please use **docker swarm** only if you need to:

* Provide redundancy in terms of API containers: In case a container went down, the incoming requests will be redirected to another running instance.

* Coordinate between the containers: Swarm will orchestrate between the APIs and choose one of them to listen to the incoming request.

* Scale up the Inference service in order to get a faster prediction especially if there's traffic on the service.

## Run The Docker Container

Docker swarm can scale up the API into multiple replicas and can be used on one or multiple hosts. In both cases, a docker swarm setup is required for all hosts.

#### Docker swarm setup

1- Enable docker swarm GPU resource:

```sh
sudo nano /etc/nvidia-container-runtime/config.toml
```

Remove # from this line `swarm-resource = "DOCKER_RESOURCE_GPU"` to enable it then save and exit.

2- The `deploy` command supports compose file version 3.0+ and runtime command in a compose file is only supported with compose file version 2.3. So we won't be able to add runtime in our stack file that why we will add default runtime in docker json file:

```sh
sudo nano /etc/docker/daemon.json
```

```json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
```

3- Finally restart docker:

```sh
sudo systemctl daemon-reload
sudo systemctl restart docker
```

4- Initialize Swarm:

```sh
docker swarm init
```

5- On the manager host, open the gpu-inference.yaml file and specify the number of replicas needed. In case you are using multiple hosts (With multiple hosts section), the number of replicas will be divided across all hosts.

```yaml
version: "3"

services:
api:
environment:
- "NVIDIA_VISIBLE_DEVICES=0"
ports:
- "1234:1234"
image: yolov3_inference_api_gpu
volumes:
- "/mnt/models:/models"
deploy:
replicas: 1
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
```
**Notes about gpu-inference.yaml:**
* the volumes field on the left of ":" should be an absolute path, can be changeable by the user, and represents the models directory on your Operating System
* the following volume's field ":/models" should never be changed
* NVIDIA_VISIBLE_DEVICES defines on which GPU you want the API to run
#### With one host
Deploy the API:
```sh
docker stack deploy -c gpu-inference.yaml yolov3-gpu
```

![onehost](./docs/yologpu.png)

#### With multiple hosts

1- **Make sure hosts are reachable on the same network**.

2- Choose a host to be the manager and run the following command on the chosen host to generate a token so the other hosts can join:

```sh
docker swarm join-token worker
```

A command will appear on your terminal, copy and paste it on the other hosts, as seen in the below image

3- Deploy your application using:

```sh
docker stack deploy -c gpu-inference.yaml yolov3-gpu
```

![multhost](./docs/yologpu2.png)

#### Useful Commands

1- In order to scale up the service to 4 replicas for example use this command:

```sh
docker service scale yolov3-gpu_api=4
```

2- To check the available workers:

```sh
docker node ls
```

3- To check on which node the container is running:

```sh
docker service ps yolov3-gpu_api
```

4- To check the number of replicas:

```sh
docker service ls
```

## Benchmarking

Here are two graphs showing time of prediction for different number of requests at the same time.


![GPU 20 req](./docs/GPU20req.png)


![GPU 40 req](./docs/GPU40req.png)


We can see that both graphs got the same result no matter what is the number of received requests at the same time. When we increase the number of workers (hosts) we are able to speed up the inference. For example we can see in the last column we were able to process 40 requests in:

- 2.3 seconds with 5 replicas in 1 machine
- 1.77 seconds with 5 replicas in each of the 2 machines
- 1.57 seconds with 5 replicas in each of the 3 machines

Moreover, in case one of the machines is down the others are always ready to receive requests.

Finally since we are predicting on GPU, scaling more replicas means a faster prediction.
20 changes: 19 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,18 @@ Models trained using our training automation Yolov4 and Yolov3 repository can b

To choose Yolov4 instead of Yolov3 training just change the inference engine name in the config.json inside your model folder.

This repo can be deployed using either **docker** or **docker swarm**.

Please use **docker swarm** only if you need to:

* Provide redundancy in terms of API containers: In case a container went down, the incoming requests will be redirected to another running instance.

* Coordinate between the containers: Swarm will orchestrate between the APIs and choose one of them to listen to the incoming request.

* Scale up the Inference service in order to get a faster prediction especially if there's traffic on the service.

If none of the aforementioned requirements are needed, simply use **docker**.

![predict image](./docs/4.gif)

## Prerequisites
Expand Down Expand Up @@ -66,7 +78,13 @@ sudo docker build --build-arg http_proxy='' --build-arg https_proxy='' -t yolov4

## Run The Docker Container

To run the API go the to the API's directory and run the following:
As mentioned before, this container can be deployed using either **docker** or **docker swarm**.

If you wish to deploy this API using **docker**, please issue the following run command.

If you wish to deploy this API using **docker swarm**, please refer to following link [docker swarm documentation](./README-docker_swarm.md). After deploying the API with docker swarm, please consider returning to this documentation for further information about the API endpoints as well as the model structure sections.

To run the API, go the to the API's directory and run the following:

#### Using Linux based docker:

Expand Down
Binary file added docs/GPU20req.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/GPU40req.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/yologpu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/yologpu2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 18 additions & 0 deletions gpu-inference.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
version: "3"

services:
api:
environment:
- "NVIDIA_VISIBLE_DEVICES=0"
ports:
- "1234:1234"
image: yolov3_inference_api_gpu
volumes:
- "/mnt/models:/models"
deploy:
replicas: 1
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure

0 comments on commit 20789df

Please sign in to comment.