Blazingly fast ML inference server powered by Rust and Burn framework
A high-performance, lightweight HTTP inference server that serves machine learning models with zero Python dependencies. Built with Rust for maximum performance and supports ONNX models including ResNet-18 for image classification.
- π¦ Pure Rust: Maximum performance, minimal memory footprint (2.3MB binary)
- π₯ ONNX Support: Direct ONNX model loading with automatic shape detection
- β‘ Fast Inference: ~4ms inference times for ResNet-18
- π‘οΈ Production Ready: Graceful shutdown, comprehensive error handling
- π HTTP API: RESTful endpoints with CORS support
- π¦ Single Binary: Zero external dependencies
- πΌοΈ Image Classification: Optimized for computer vision models
git clone https://github.com/yourusername/furnace.git
cd furnace
cargo build --release
# Download ResNet-18 ONNX model (45MB)
curl -L "https://github.com/onnx/models/raw/main/validated/vision/classification/resnet/model/resnet18-v1-7.onnx" -o resnet18.onnx
./target/release/furnace --model-path resnet18.onnx --host 127.0.0.1 --port 3000
# Generate ResNet-18 test samples (creates JSON files locally)
cargo run --example resnet18_sample_data
This creates the following test files:
resnet18_single_sample.json
- Single image test dataresnet18_batch_sample.json
- Batch of 3 images test dataresnet18_full_test.json
- Full-size single image (150,528 values)
# Health check
curl http://localhost:3000/healthz
# Model info
curl http://localhost:3000/model/info
# Single image prediction
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
--data-binary @resnet18_full_test.json
# Batch prediction
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
--data-binary @resnet18_batch_sample.json
Furnace supports ONNX models with automatic shape detection. Currently optimized for image classification models.
Model | Input Shape | Output Shape | Size | Status |
---|---|---|---|---|
ResNet-18 | [3, 224, 224] |
[1000] |
45MB | β Supported |
MobileNet v2 | [3, 224, 224] |
[1000] |
14MB | π§ͺ Testing |
SqueezeNet | [3, 224, 224] |
[1000] |
5MB | π§ͺ Testing |
# ResNet-18 (ImageNet classification) - Recommended
curl -L "https://github.com/onnx/models/raw/main/validated/vision/classification/resnet/model/resnet18-v1-7.onnx" -o resnet18.onnx
# MobileNet v2 (lightweight, mobile-friendly)
curl -L "https://github.com/onnx/models/raw/main/validated/vision/classification/mobilenet/model/mobilenetv2-12.onnx" -o mobilenetv2.onnx
# SqueezeNet (very lightweight)
curl -L "https://github.com/onnx/models/raw/main/validated/vision/classification/squeezenet/model/squeezenet1.0-12.onnx" -o squeezenet.onnx
To use your own ONNX models:
- Export your model to ONNX format
- Ensure input shape compatibility (currently optimized for image classification)
- Test with Furnace using the same API endpoints
# Example: Export PyTorch model to ONNX
import torch
import torchvision.models as models
model = models.resnet18(pretrained=True)
model.eval()
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(model, dummy_input, "my_model.onnx")
Metric | Value |
---|---|
Binary Size | 2.3MB |
Model Size | 45MB |
Inference Time | ~4ms |
Memory Usage | <200MB |
Startup Time | <2s |
Input Size | 150,528 values |
Output Size | 1,000 classes |
Prerequisites:
# 1. Download ResNet-18 model (if not already done)
curl -L "https://github.com/onnx/models/raw/main/validated/vision/classification/resnet/model/resnet18-v1-7.onnx" -o resnet18.onnx
# 2. Generate test data (benchmarks use dynamic model detection)
cargo run --example resnet18_sample_data
Run benchmarks:
# Run all benchmarks
cargo bench
# Run specific benchmark
cargo bench single_inference
cargo bench batch_inference
cargo bench latency_measurement
- Single Inference: ~4ms per image (ResNet-18)
- Batch Processing: Optimized for batches of 1-8 images
- Concurrent Requests: Handles multiple simultaneous requests
- Memory Efficiency: Minimal memory allocation per request
- Throughput: Scales with available CPU cores
Health check endpoint
{
"status": "healthy",
"model_loaded": true,
"uptime_seconds": 3600,
"timestamp": "2024-01-01T12:00:00Z"
}
Model metadata and statistics
{
"model_info": {
"name": "resnet18",
"input_spec": {"shape": [3, 224, 224], "dtype": "float32"},
"output_spec": {"shape": [1000], "dtype": "float32"},
"model_type": "burn",
"backend": "onnx"
},
"stats": {
"inference_count": 42,
"total_inference_time_ms": 168.0,
"average_inference_time_ms": 4.0
}
}
Run inference on input data
Single Image:
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
--data-binary @resnet18_full_test.json
Batch Images:
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
--data-binary @resnet18_batch_sample.json
Response:
{
"output": [0.1, 0.05, 0.02, ...], // 1000 ImageNet class probabilities
"status": "success",
"inference_time_ms": 4.0,
"timestamp": "2024-01-01T12:00:00Z",
"batch_size": 1
}
ResNet-18 expects normalized RGB image data:
- Shape:
[3, 224, 224]
(150,528 values) - Format: Flattened array of float32 values
- Range: Typically 0.0 to 1.0 (normalized pixel values)
- Order: Channel-first (RGB channels, then height, then width)
- Rust 1.70+
- Cargo
cargo build --release
cargo test
Implement the BurnModel
trait in src/burn_model.rs
to add support for your own model architectures.
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β CLI Layer βββββΆβ Model Layer βββββΆβ API Layer β
β β β β β β
β - Argument β β - Model Loading β β - HTTP Routes β
β Parsing β β - Inference β β - Request β
β - Validation β β - Metadata β β Handling β
β - Logging Setup β β - Error Handlingβ β - CORS β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.