Performance

Right now the model used is a fp32 model provided by Nvidia. This is the only *trained* model i can find for detecting faces right now other than MTCNN which is actually three models and harder to plug in the pipeline. In any case, any trained models available would requires a [clusterf**k of conversions](https://github.com/ysh329/deep-learning-model-convertor).

Around the same time I created this app, Nvidia seemed to be thinking the same thing, so they produced [this tutorial](https://devblogs.nvidia.com/real-time-redaction-app-nvidia-deepstream-part-1-training/) which is fantastic, but they don't provide a trained model... which means I must train it myself.

The dataset required to train is over 1TB -- more than is available on my GPU box, meaning I would have to use my NAS and nfs or something for the actual storage. All this is doable, but it means tying up my GPUs for what would likely be days. According to Nvidia it takes 8 hours on a DGX-1, but two 1080s is proably *quite* a bit slower.

If anybody is willing to follow that tutorial, pay for the cloud time, and send me the trained model it would be greatly appreciated (and you'd get your name in the credits). Otherwise performance enhancements will have to wait until my GPU box has a week free, which might be a while since I'm using nvidia-docker on it all the time.

Alternatively, if somebody knows how to quantize the model I have, that works too. Most of the tools I have found expect an .onnx, but it looks like [the code might be able to be modified](https://github.com/mdegans/tensorrt-utils/blob/master/classification/imagenet/onnx_to_tensorrt.py) to use a different parser. Something worth exploring.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions