
ChAI aims to bridge the gap between powerful AI models and real-world production use cases. Our repository offers ready-to-use, optimized, and fully deployable solutions based on Open Source technologies.
- Production Readiness: Each example is production-tested and optimized.
- Simplicity & Clarity: Step-by-step deployment guides with clear documentation.
- Performance Optimized: Leverages TensorRT, ONNX, Triton Inference Server, and GPU acceleration.
- Open Source Commitment: All examples rely on fully open-source technologies and tools.
- Speech-to-Text (STT): NVIDIA NeMo, Whisper
- Text-to-Speech (TTS): VITS family models
- Inference & Deployment: NVIDIA Triton Inference Server, ONNX Runtime, TensorRT, Docker, Kubernetes
- Infrastructure as Code: Docker-compose, Terraform examples
- NeMo ASR English with ONNX & TensorRT optimization (initial example)
- Real-time streaming inference examples
- Whisper multilingual models on Triton Inference Server
- VITS TTS high-quality synthesis setup
- Deploy VITS family TTS with ONNX & TensorRT
- Emotion and voice modulation support in TTS models
- Kubernetes deployment examples with auto-scaling
- Cloud-native setups on AWS, GCP, Azure
Clone the repo and follow the detailed guides provided for each example.
git clone https://github.com/2Bye/chai.git
cd chai
Detailed example with step-by-step setup:
- Convert NeMo ASR model to ONNX.
- Optimize with TensorRT.
- Deploy on Triton Inference Server.
See examples/asr/nemo_onnx_trt for detailed instructions.
We welcome contributions from everyone. You can:
- Suggest new examples or improvements
- Fix bugs, optimize code, or enhance documentation
- Share benchmarks and deployment experiences
Open an issue or pull request to get involved!
Let's make deploying AI easier, faster, and open to everyone.
ChAI – Your shortcut to production-ready AI solutions.