This repository consists of code and articles on the Neural Bits Newsletter that showcase:
- how to optimize, and quantize models for optimal performance
- efficient model serving in production environments at scale
ID | 📝 Article | 💻 Code | Details | Complexity | Tech Stack |
---|---|---|---|---|---|
001 | Inference Engines Profilling | Here | Profile a CNN model across PyTorch, ONNX, TensorRT, and TorchCompile | 🟩🟩⬜ | Python, Jupyter |
ID | 📝 Article | 💻 Code | Details | Complexity | Tech Stack |
---|---|---|---|---|---|
002 | Deploying DL models with NVIDIA Triton Inference Server | Here | Full tutorial on how to set-up and deploy ML models with Triton Inference Server | 🟩🟩🟩 | Python, Docker, Bash |
ID | Article | Code | Details | Complexity | Tech Stack |
---|