A sophisticated Caddy middleware that provides secure API key authentication, intelligent token usage metering, x402 prepaid payment protocol, and comprehensive metrics collection for AI/ML API gateways. The middleware validates API keys against multiple sources while tracking detailed usage statistics and reporting to blockchain-based smart contracts.
Accurate Token Counting - We've completely reimplemented token counting to fix the 2-2.5x inflation issue:
- β Model-Specific Tokenizers - Uses HuggingFace and SentencePiece tokenizers for accurate counting
- β 90-95% Accuracy - Matches actual model token usage for billing-grade precision
- β Automatic Model Detection - Detects model from request JSON and applies correct tokenizer
- β Multi-Model Support - Llama 2/3/3.3, Mistral, Mixtral, Falcon, BERT, and more
- β Pure Go Implementation - No CGO or external dependencies required
- β Smart Fallback - Gracefully handles unknown models with conservative estimation
Before: (chars/4 + wordsΓ1.33)/2 inflated counts by 2-2.5x
After: Real tokenization matching actual AI model usage
See detailed changes in METERING.md
This middleware implements a comprehensive API gateway solution designed for high-security AI/ML environments requiring:
- Multi-tiered Authentication - Master keys, file-based keys, and Secret Network smart contracts
- x402 Payment Protocol - Portal-based prepaid billing for AI agents with DevPortal balance checks, 402 payment challenges, and async usage reporting
- Accurate Token Metering - Model-specific token counting for precise billing and usage tracking
- Comprehensive Metrics - Performance monitoring, usage analytics, and operational insights
- Blockchain Integration - Decentralized usage reporting via Secret Network smart contracts
- Production-Ready Security - Encrypted communication, secure caching, and audit logging
- π Architecture - Complete system architecture and component design
- π³ x402 Payment Protocol - Portal-based balance checking, 402 challenges, and usage reporting
- βοΈ Metering & Metrics - Token counting, usage tracking, and metrics collection
graph TB
subgraph "Client Layer"
C[AI/ML Clients<br/>with API Keys]
A[AI Agents<br/>with Bearer Tokens]
end
subgraph "Caddy Gateway"
subgraph "Middleware Pipeline"
ROUTE{Master Key?}
AUTH[API Key Authentication]
X402[x402 Portal Path]
METER[Token Metering]
METRICS[Metrics Collection]
PROXY[Reverse Proxy]
end
end
subgraph "Authentication Sources"
MK[Master Keys]
MKF[Master Keys File]
CACHE[Cached Results]
SC[Secret Network<br/>Smart Contract]
end
subgraph "x402 Components"
PORTAL[Portal Client]
CHALLENGE[402 Challenge Builder]
REPORT[Async Usage Reporter]
end
subgraph "DevPortal"
BAL[Balance API]
USAGE[Usage Reporting API]
PAY[Payment Processing]
end
subgraph "AI/ML Services"
AI1[OpenAI API]
AI2[Ollama]
AI3[TorchServe]
AI4[Custom ML APIs]
end
subgraph "Reporting & Analytics"
BLOCKCHAIN[Secret Network<br/>Usage Reporting]
METRICS_API[Metrics Endpoint<br/>/metrics]
end
C -->|HTTP + API Key| ROUTE
A -->|HTTP + Bearer Token| ROUTE
ROUTE -->|Yes| AUTH
ROUTE -->|No, x402 enabled| X402
AUTH --> MK
AUTH --> MKF
AUTH --> CACHE
AUTH -->|Cache Miss| SC
AUTH -->|Authorized| METER
X402 --> PORTAL
PORTAL -->|Check Balance| BAL
PORTAL -->|Insufficient| CHALLENGE
PORTAL -->|Sufficient| METER
METER -->|Count Tokens| METRICS
METER --> PROXY
PROXY --> AI1
PROXY --> AI2
PROXY --> AI3
PROXY --> AI4
REPORT -->|Token Counts| USAGE
A -->|Top Up| PAY
METRICS -->|Usage Data| BLOCKCHAIN
METRICS --> METRICS_API
The gateway provides accurate token counting for these models using industry-standard tokenizers:
| Model Family | Variants | Tokenizer |
|---|---|---|
| Llama | Llama 2 (7B, 13B, 70B) Llama 3 (8B, 70B) Llama 3.3 (70B) |
HuggingFace |
| Mistral | Mistral 7B v0.1/v0.2 Mixtral 8x7B |
HuggingFace |
| Falcon | Falcon 7B, 40B, 180B | HuggingFace |
| BERT | BERT base, large | HuggingFace |
The system automatically detects the model from the model field in your JSON request:
{
"model": "llama3.3:70b", // Detected and uses Llama-3.3 tokenizer
"prompt": "Your prompt here"
}Model name variations handled:
llama3.3:70bβllama3.3mistral-7b-v0.1βmistral-7b- Case-insensitive matching
- Automatic normalization
For models not in the supported list, the system uses a conservative chars/4 fallback estimation (60-70% accuracy) - no configuration needed.
To add support for custom HuggingFace models, use the configuration:
preload_models llama-2,mistral,your-custom-modelAny model available on HuggingFace with a tokenizer.json file can be used.
- Multi-tier validation with configurable precedence and fallback
- Secure caching with SHA256 hashing and configurable TTL
- Secret Network integration with encrypted blockchain communication
- Dynamic key rotation via file-based keys without service restart
- Thread-safe operations with optimized read-write mutex usage
- Model-specific tokenization using HuggingFace and SentencePiece libraries
- 90-95% accuracy matching actual AI model token usage for billing-grade precision
- Automatic model detection from JSON request body
- Supported models: Llama 2/3/3.3, Mistral, Mixtral, Falcon, BERT, and custom HuggingFace models
- Lazy-loading with caching - tokenizers load once and are reused for performance
- Request/response tracking with comprehensive body analysis
- Usage accumulation per API key and per model with thread-safe operations
- Resilient reporting with retry logic and failed report persistence
- Smart fallback to conservative estimation for unknown models
- Real-time monitoring of requests, tokens, performance, and errors
- HTTP metrics endpoint at
/metricswith detailed JSON output - Cache performance tracking including hit rates and operation times
- Token usage analytics with input/output token breakdowns
- System health indicators for operational monitoring
- Stateless proxy β Caddy delegates all balance management to DevPortal
- Per-request balance check via DevPortal API with service-key authentication
- 402 Payment Required responses with USDC-denominated challenge payloads and topup URLs
- Async usage reporting β token counts sent to DevPortal without blocking the response
- Fail-closed β returns 503 when DevPortal is unreachable (no unpaid usage)
- Master key bypass β admin/operator keys skip portal balance checks
- Configurable threshold β minimum balance required to serve requests (
x402_min_balance_usdc) - Portal owns pricing β Caddy reports raw token counts, DevPortal computes cost
See x402_Caddy_Implementation.md for full documentation and x402-portal.md for the end-to-end payment flow design.
- Pattern-based blocking with configurable URL patterns via environment variables
- Early request filtering for performance optimization (before API key validation)
- Comprehensive logging of blocked requests with pattern matching details
- HTTP 403 Forbidden responses for blocked requests with clear error messages
- Metrics integration for tracking blocked request statistics
- Environment variable support for secure configuration management
- URL filtering with configurable blocked URL patterns via BLOCK_URLS environment variable
- Graceful error handling with detailed logging and audit trails
- Resource management with configurable limits and cleanup procedures
- Docker-ready deployment with multi-stage builds and health checks
- Go 1.26+
- Docker & Docker Compose
- Git
The project uses a multi-stage Dockerfile to build Caddy with the custom module:
# Build the custom Caddy image
docker build -t secret-reverse-proxy:latest .The Dockerfile:
- Builder Stage: Uses Go 1.26+ to install xcaddy and build Caddy with the secret-reverse-proxy module
- Runtime Stage: Creates lightweight Alpine-based runtime with security hardening
- Security Features: Non-root user, minimal dependencies, health checks
# Start all services including echo server for testing
docker-compose up --buildThe docker-compose setup includes:
- caddy: Custom Caddy with secret-reverse-proxy module
- echo-server: Simple HTTP echo service for testing backend responses
- networking: Isolated testnet for secure communication
Valid API Key Test:
curl -H "Authorization: Bearer bWFzdGVyQHNjcnRsYWJzLmNvbTpTZWNyZXROZXR3b3JrTWFzdGVyS2V5X18yMDI1" \
-H "Content-Type: application/json" \
-d '{"model": "llama3.3:70b", "prompt": "Hello, world!", "max_tokens": 100}' \
http://localhost:8085/
# Expected: 200 OK with accurate token count in logsInvalid API Key Test:
curl -H "Authorization: Bearer invalid-key-123" \
http://localhost:8085/
# Expected: 401 UnauthorizedMissing Authorization Test:
curl http://localhost:8085/
# Expected: 401 UnauthorizedAccurate Token Counting Test (Llama):
curl -H "Authorization: Bearer bWFzdGVyQHNjcnRsYWJzLmNvbTpTZWNyZXROZXR3b3JrTWFzdGVyS2V5X18yMDI1" \
-H "Content-Type: application/json" \
-d '{"model": "llama3.3:70b", "prompt": "Write a haiku about programming", "max_tokens": 100}' \
http://localhost:8085/
# Uses Llama-3.3 tokenizer for accurate countingAccurate Token Counting Test (Mistral):
curl -H "Authorization: Bearer bWFzdGVyQHNjcnRsYWJzLmNvbTpTZWNyZXROZXR3b3JrTWFzdGVyS2V5X18yMDI1" \
-H "Content-Type: application/json" \
-d '{"model": "mistral-7b", "messages": [{"role": "user", "content": "Explain quantum computing"}]}' \
http://localhost:8085/chat/completions
# Uses Mistral tokenizer for accurate countingUnknown Model Test (Fallback):
curl -H "Authorization: Bearer bWFzdGVyQHNjcnRsYWJzLmNvbTpTZWNyZXROZXR3b3JrTWFzdGVyS2V5X18yMDI1" \
-H "Content-Type: application/json" \
-d '{"model": "custom-gpt-x", "prompt": "Test prompt"}' \
http://localhost:8085/
# Uses fallback chars/4 estimation for unknown modelsMetrics Check:
curl http://localhost:8085/metricsURL Filtering Test (Blocked):
# Set environment variable for URL blocking
export BLOCK_URLS="/admin,/config,/internal"
# This request will be blocked with HTTP 403
curl -H "Authorization: Bearer bWFzdGVyQHNjcnRsYWJzLmNvbTpTZWNyZXROZXR3b3JrTWFzdGVyS2V5X18yMDI1" \
http://localhost:8085/admin/usersThe Caddyfile-test demonstrates comprehensive configuration:
:80 {
secret_reverse_proxy {
# Authentication configuration
API_MASTER_KEY {env.SECRET_API_MASTER_KEY}
master_keys_file /etc/caddy/master_keys.txt
secret_node {env.SECRET_NODE}
contract_address {env.SECRET_CONTRACT}
secret_chain_id {env.SECRET_CHAIN_ID}
# permit_file is optional if SECRETAI_PERMIT_TYPE, SECRETAI_PERMIT_PUBKEY,
# and SECRETAI_PERMIT_SIG env vars are set instead
permit_file /etc/caddy/permit.json
# Metering configuration
metering {env.METERING}
metering_interval {env.METERING_INTERVAL}
metering_url {env.METERING_URL}
# Token counting settings (v2.0)
max_body_size 2097152 # 2MB max body size
token_counting_mode accurate # Uses model-specific tokenizers
tokenizer_cache_dir /tmp/tokenizers # Cache directory for tokenizers
preload_models llama-2,mistral # Pre-cache common models for fast startup
# Reporting settings
max_retries 5 # retry attempts for failed reports
retry_backoff 300s # backoff between retries
# Metrics configuration
enable_metrics true # enable /metrics endpoint
metrics_path /metrics # metrics endpoint path
}
reverse_proxy echo-server:80 {
health_uri /health
health_interval 30s
health_timeout 10s
}
}The x402 test suite includes unit tests and integration tests with a mock DevPortal:
cd secret-reverse-proxy
# Run x402 unit tests (portal client, challenge builder, USDC conversion)
go test -v ./x402/
# Run x402 integration tests (full middleware flow with mock portal)
go test -v -run TestX402The integration tests verify: insufficient balance (402), sufficient balance (200 + usage report), master key bypass, portal unreachable (503), and partial balance deficit calculation. See x402_Caddy_Implementation.md for the full test plan including Docker-based end-to-end testing.
cd secret-reverse-proxy
go test -v ./...go test -v -tags=integration ./...go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out# Test API key validation
go test -v ./validators/
# Test x402 payment protocol
go test -v ./x402/...
# Test token counting
go test -v -run TestTokenCounter
# Test metering functionality
go test -v -run TestMetering| Variable | Description | Example |
|---|---|---|
SECRET_API_MASTER_KEY |
Primary API key for authentication | your-secure-master-key |
SECRET_NODE |
Secret Network LCD endpoint | lcd.secret.tactus.starshell.net |
SECRET_CHAIN_ID |
Secret Network chain identifier | secret-4 |
SECRET_CONTRACT |
Smart contract address for validation | secret18xpp2kmkk7g8xzx24wm5zjstw9tjv6g3xle2vjm |
METERING |
Enable/disable usage metering | 1 or true |
METERING_INTERVAL |
Reporting interval | 5m, 1h |
METERING_URL |
Endpoint for usage reports | https://api.example.com |
BLOCK_URLS |
Comma-separated list of URL patterns to block | /admin,/config,/internal |
SECRETAI_MASTER_KEYS |
Comma-separated list of master API keys. Used as an alternative (or in addition) to master_keys_file. |
key1,key2,key3 |
SECRETAI_PERMIT_TYPE |
Permit public key type. Required when no permit_file is configured β used to construct a permit on the fly for retrieving API keys from KMS. |
tendermint/PubKeySecp256k1 |
SECRETAI_PERMIT_PUBKEY |
Permit public key value. Required when no permit_file is configured. |
Aur9D8RLq... |
SECRETAI_PERMIT_SIG |
Permit signature. Required when no permit_file is configured. |
TeNtblPmo... |
DEVPORTAL_URL |
DevPortal base URL for balance checks and usage reporting | https://devportal.example.com |
DEVPORTAL_SERVICE_KEY |
Shared secret for Caddy-to-DevPortal service authentication | caddy-secret-key-123 |
| Directive | Type | Description | Default |
|---|---|---|---|
API_MASTER_KEY |
string | Primary master key | None |
master_keys_file |
path | Path to a file containing additional master keys (one per line). Optional β if not configured, SECRETAI_MASTER_KEYS env var can be used instead. |
"" |
permit_file |
path | Path to a JSON file containing the Secret Network permit configuration used to retrieve Secret AI API Keys from KMS. Optional β if not configured, the system constructs a permit on the fly from SECRETAI_PERMIT_TYPE, SECRETAI_PERMIT_PUBKEY, and SECRETAI_PERMIT_SIG env vars. |
None |
contract_address |
string | Smart contract address | Required |
secret_node |
string | Secret Network node | Required |
secret_chain_id |
string | Chain ID | Required |
metering |
boolean | Enable usage metering | false |
metering_interval |
duration | Reporting frequency | 10m |
metering_url |
string | Usage reporting endpoint | "" |
max_body_size |
bytes | Max request body size | 10MB |
token_counting_mode |
string | Token counting mode (always uses accurate v2.0) | accurate |
tokenizer_cache_dir |
path | Directory for caching tokenizer files | /tmp/tokenizers |
preload_models |
string | Comma-separated models to pre-cache | llama-2,mistral |
max_retries |
int | Failed report retry attempts | 3 |
retry_backoff |
duration | Retry delay | 5m |
enable_metrics |
boolean | Enable metrics collection | false |
metrics_path |
string | Metrics HTTP endpoint | /metrics |
x402_enabled |
boolean | Enable portal-based x402 payment protocol | false |
devportal_url |
string | DevPortal base URL | Required if x402 enabled |
devportal_service_key |
string | Shared secret for service-to-service auth | Required if x402 enabled |
x402_min_balance_usdc |
string | Minimum agent balance in USDC (e.g., "0.01") |
Required if x402 enabled |
x402_topup_url |
string | Override topup URL in 402 responses | {devportal_url}/api/agent/add-funds |
docker run -d \
--name secret-ai-caddy \
-p 80:80 -p 443:443 \
-e SECRET_API_MASTER_KEY="your-production-key" \
-e SECRET_NODE="lcd.secret.tactus.starshell.net" \
-e SECRET_CONTRACT="secret18xpp2kmkk7g8xzx24wm5zjstw9tjv6g3xle2vjm" \
-e SECRET_CHAIN_ID="secret-4" \
-e METERING=true \
-e METERING_INTERVAL="5m" \
-e METERING_URL="https://your-metrics-api.com" \
-e BLOCK_URLS="/admin,/config,/internal" \
secret-reverse-proxy:latestdocker run -d \
--name secret-ai-caddy \
--restart unless-stopped \
-p 80:80 -p 443:443 \
-v ./Caddyfile:/etc/caddy/Caddyfile \
-v ./master_keys.txt:/etc/caddy/master_keys.txt \
-v ./permit.json:/etc/caddy/permit.json \
-v caddy_data:/data \
-v caddy_config:/config \
-e SECRET_API_MASTER_KEY="your-production-key" \
secret-reverse-proxy:latestversion: '3.8'
services:
secret-ai-caddy:
image: secret-reverse-proxy:latest
ports:
- "80:80"
- "443:443"
environment:
- SECRET_API_MASTER_KEY=${SECRET_API_MASTER_KEY}
- SECRET_NODE=${SECRET_NODE}
- SECRET_CONTRACT=${SECRET_CONTRACT}
- SECRET_CHAIN_ID=${SECRET_CHAIN_ID}
- METERING=true
- METERING_INTERVAL=5m
- METERING_URL=${METERING_URL}
- BLOCK_URLS=${BLOCK_URLS}
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile
- ./master_keys.txt:/etc/caddy/master_keys.txt
- ./permit.json:/etc/caddy/permit.json
- caddy_data:/data
- caddy_config:/config
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
timeout: 10s
retries: 3
volumes:
caddy_data:
caddy_config:-
API Key Security
- Use environment variables for sensitive keys
- Rotate master keys regularly
- Implement key versioning
- Monitor key usage patterns
-
File Security
- Secure master keys file with
600permissions - Use separate permit files per environment
- Regular backup of configuration files
- Secure master keys file with
-
Network Security
- Always use HTTPS in production
- Implement proper firewall rules
- Use private networks for backend communication
- Enable rate limiting per API key
-
Monitoring & Alerting
- Monitor authentication failure rates
- Set up alerts for contract query failures
- Track unusual usage patterns
- Monitor system resource usage
-
Operational Security
- Regular security updates
- Log analysis and monitoring
- Incident response procedures
- Backup and recovery plans
System Health:
curl http://localhost:8085/healthMetrics Overview:
curl http://localhost:8085/metrics | jq-
Authentication Failures
# Check logs for details docker logs caddy-reverse-proxy # Verify environment variables docker exec caddy-reverse-proxy env | grep SECRET
-
Contract Query Issues
# Test network connectivity curl https://lcd.secret.tactus.starshell.net/status # Verify contract address curl "https://lcd.secret.tactus.starshell.net/compute/v1beta1/code_hash/by_contract_address/YOUR_CONTRACT"
-
Token Counting Problems
# Check metering logs for tokenizer loading docker logs caddy-reverse-proxy 2>&1 | grep -i tokenizer # Check for accurate token counting docker logs caddy-reverse-proxy 2>&1 | grep -i "Used accurate tokenizer" # Test with model-specific request curl -H "Authorization: Bearer YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "llama3.3:70b", "prompt": "test"}' \ http://localhost:8085/ # Verify tokenizer cache docker exec caddy-reverse-proxy ls -la /tmp/tokenizers
Common Token Counting Issues:
- If seeing "Failed to load tokenizer" warnings, check network connectivity for HuggingFace downloads
- Tokenizers are cached after first use - subsequent requests should be fast
- Unknown models automatically fall back to conservative chars/4 estimation
- Check
preload_modelsconfiguration to pre-cache commonly used models
-
URL Filtering Issues
# Check if BLOCK_URLS is set docker exec caddy-reverse-proxy env | grep BLOCK_URLS # View filtering logs docker logs caddy-reverse-proxy 2>&1 | grep -i "blocked" # Test blocked URL curl -H "Authorization: Bearer YOUR_KEY" \ http://localhost:8085/admin/test # Should return HTTP 403 Forbidden # Test allowed URL curl -H "Authorization: Bearer YOUR_KEY" \ http://localhost:8085/api/test # Should proceed to API key validation
{
debug
log {
output stdout
format console
level DEBUG
}
}- Authentication Latency: <1ms for cache hits, <500ms for contract queries
- Token Counting (Accurate Mode):
- First request with model: 50-200ms (downloads and caches tokenizer from HuggingFace)
- Cached tokenizer: 1-5ms per request (accurate tokenization)
- Unknown model fallback: <1ms (simple chars/4 estimation)
- Preloaded models: 1-5ms from first request
- Memory Usage:
- ~1KB per 1000 cached API keys
- ~5-15MB per cached tokenizer (depends on model)
- Typical deployment: 20-50MB for 2-3 common models
- Throughput: Supports 10k+ RPS with proper caching
- Cache Efficiency:
- API keys: 95%+ hit rate for stable key sets
- Tokenizers: 100% hit rate after initial load (cached permanently)
| Model Type | Accuracy vs Actual | Method |
|---|---|---|
| Llama 2/3/3.3 | 90-95% | HuggingFace tokenizer |
| Mistral/Mixtral | 90-95% | HuggingFace tokenizer |
| Falcon | 90-95% | HuggingFace tokenizer |
| BERT | 90-95% | HuggingFace tokenizer |
| Unknown models | 60-70% | Chars/4 fallback |
Before v2.0: Heuristic method inflated counts by 2-2.5x After v2.0: Within 5-10% of actual usage for supported models
Token Counting System:
- Old heuristic
(chars/4 + wordsΓ1.33)/2replaced with model-specific tokenizers - Token counts will be 40-60% lower for most requests (more accurate)
- Per-model usage tracking now available
-
Update Docker Image
docker pull secret-reverse-proxy:latest # or rebuild: docker build -t secret-reverse-proxy:latest . -
Update Configuration (Optional)
secret_reverse_proxy { # ... existing config ... # New optional settings (v2.0) tokenizer_cache_dir /tmp/tokenizers # Default location preload_models llama-2,mistral # Common models }
-
Monitor First Deployment
# Watch for tokenizer downloads (first time only) docker logs -f caddy-reverse-proxy | grep tokenizer # Verify accurate counting docker logs -f caddy-reverse-proxy | grep "Used accurate tokenizer"
-
Expect Lower Token Counts
- Old counts were inflated 2-2.5x
- New counts are 90-95% accurate for supported models
- Update billing expectations accordingly
If you need to rollback:
# Use previous image version
docker pull secret-reverse-proxy:v1.x
docker-compose up -d- Fork the repository
- Create a feature branch
- Implement changes with tests
- Update documentation
- Submit a pull request
[Add appropriate license information]
For issues and questions:
- Documentation: See Architecture and Metering guides
- Issues: GitHub Issues tracker
- Security: Contact alexh@scrtlabs.com for security-related issues