This example deploys Iris, ResNet50 and Inception models in one API. Query parameters are used for selecting the model.
Since model caching is enabled, there can only be 1 model loaded into memory - loading a 2rd one will lead to the removal of the least recently used one.
The example can be run on both CPU and on GPU hardware.
When making a prediction with sample-image.json, the following image will be used:
Make a request to the ResNet50 model:
curl "http://localhost:8080/?model=resnet50" -X POST -H "Content-Type: application/json" -d @sample-image.json
The expected response is:
{"label": "sports_car"}
Loading the following model will evict the previously loaded model from memory because the cache_size
is set to 1. No model from disk is removed.
Make a request to the Inception model:
curl "http://localhost:8080/?model=inception" -X POST -H "Content-Type: application/json" -d @sample-image.json
The expected response is:
{"label": "sports_car"}
At this point, there are 2 models loaded onto disk (as specified by disk_cache_size
). Loading the iris
classifier will lead to the removal of the least recently used model - in this case, it will be the ResNet50 model that will get evicted.
Make a request to the Iris model:
curl "${ENDPOINT}?model=iris" -X POST -H "Content-Type: application/json" -d @sample-iris.json
The expected response is:
{"label": "setosa"}