Skip to content

scrtlabs/secret-ai-caddy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

66 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Secret AI Caddy - Advanced API Gateway

A sophisticated Caddy middleware that provides secure API key authentication, intelligent token usage metering, x402 prepaid payment protocol, and comprehensive metrics collection for AI/ML API gateways. The middleware validates API keys against multiple sources while tracking detailed usage statistics and reporting to blockchain-based smart contracts.

⭐ What's New in v2.0

Accurate Token Counting - We've completely reimplemented token counting to fix the 2-2.5x inflation issue:

  • βœ… Model-Specific Tokenizers - Uses HuggingFace and SentencePiece tokenizers for accurate counting
  • βœ… 90-95% Accuracy - Matches actual model token usage for billing-grade precision
  • βœ… Automatic Model Detection - Detects model from request JSON and applies correct tokenizer
  • βœ… Multi-Model Support - Llama 2/3/3.3, Mistral, Mixtral, Falcon, BERT, and more
  • βœ… Pure Go Implementation - No CGO or external dependencies required
  • βœ… Smart Fallback - Gracefully handles unknown models with conservative estimation

Before: (chars/4 + wordsΓ—1.33)/2 inflated counts by 2-2.5x After: Real tokenization matching actual AI model usage

See detailed changes in METERING.md

🎯 Project Purpose

This middleware implements a comprehensive API gateway solution designed for high-security AI/ML environments requiring:

  1. Multi-tiered Authentication - Master keys, file-based keys, and Secret Network smart contracts
  2. x402 Payment Protocol - Portal-based prepaid billing for AI agents with DevPortal balance checks, 402 payment challenges, and async usage reporting
  3. Accurate Token Metering - Model-specific token counting for precise billing and usage tracking
  4. Comprehensive Metrics - Performance monitoring, usage analytics, and operational insights
  5. Blockchain Integration - Decentralized usage reporting via Secret Network smart contracts
  6. Production-Ready Security - Encrypted communication, secure caching, and audit logging

πŸ“š Documentation

πŸ—οΈ Architecture Overview

graph TB
    subgraph "Client Layer"
        C[AI/ML Clients<br/>with API Keys]
        A[AI Agents<br/>with Bearer Tokens]
    end

    subgraph "Caddy Gateway"
        subgraph "Middleware Pipeline"
            ROUTE{Master Key?}
            AUTH[API Key Authentication]
            X402[x402 Portal Path]
            METER[Token Metering]
            METRICS[Metrics Collection]
            PROXY[Reverse Proxy]
        end
    end

    subgraph "Authentication Sources"
        MK[Master Keys]
        MKF[Master Keys File]
        CACHE[Cached Results]
        SC[Secret Network<br/>Smart Contract]
    end

    subgraph "x402 Components"
        PORTAL[Portal Client]
        CHALLENGE[402 Challenge Builder]
        REPORT[Async Usage Reporter]
    end

    subgraph "DevPortal"
        BAL[Balance API]
        USAGE[Usage Reporting API]
        PAY[Payment Processing]
    end

    subgraph "AI/ML Services"
        AI1[OpenAI API]
        AI2[Ollama]
        AI3[TorchServe]
        AI4[Custom ML APIs]
    end

    subgraph "Reporting & Analytics"
        BLOCKCHAIN[Secret Network<br/>Usage Reporting]
        METRICS_API[Metrics Endpoint<br/>/metrics]
    end

    C -->|HTTP + API Key| ROUTE
    A -->|HTTP + Bearer Token| ROUTE
    ROUTE -->|Yes| AUTH
    ROUTE -->|No, x402 enabled| X402
    AUTH --> MK
    AUTH --> MKF
    AUTH --> CACHE
    AUTH -->|Cache Miss| SC
    AUTH -->|Authorized| METER
    X402 --> PORTAL
    PORTAL -->|Check Balance| BAL
    PORTAL -->|Insufficient| CHALLENGE
    PORTAL -->|Sufficient| METER
    METER -->|Count Tokens| METRICS
    METER --> PROXY
    PROXY --> AI1
    PROXY --> AI2
    PROXY --> AI3
    PROXY --> AI4
    REPORT -->|Token Counts| USAGE
    A -->|Top Up| PAY

    METRICS -->|Usage Data| BLOCKCHAIN
    METRICS --> METRICS_API
Loading

πŸ€– Supported AI Models (v2.0)

The gateway provides accurate token counting for these models using industry-standard tokenizers:

Fully Supported Models (90-95% Accuracy)

Model Family Variants Tokenizer
Llama Llama 2 (7B, 13B, 70B)
Llama 3 (8B, 70B)
Llama 3.3 (70B)
HuggingFace
Mistral Mistral 7B v0.1/v0.2
Mixtral 8x7B
HuggingFace
Falcon Falcon 7B, 40B, 180B HuggingFace
BERT BERT base, large HuggingFace

Model Detection

The system automatically detects the model from the model field in your JSON request:

{
  "model": "llama3.3:70b",     // Detected and uses Llama-3.3 tokenizer
  "prompt": "Your prompt here"
}

Model name variations handled:

  • llama3.3:70b β†’ llama3.3
  • mistral-7b-v0.1 β†’ mistral-7b
  • Case-insensitive matching
  • Automatic normalization

Unknown Models

For models not in the supported list, the system uses a conservative chars/4 fallback estimation (60-70% accuracy) - no configuration needed.

Adding Custom Models

To add support for custom HuggingFace models, use the configuration:

preload_models llama-2,mistral,your-custom-model

Any model available on HuggingFace with a tokenizer.json file can be used.

✨ Key Features

πŸ” Advanced Authentication

  • Multi-tier validation with configurable precedence and fallback
  • Secure caching with SHA256 hashing and configurable TTL
  • Secret Network integration with encrypted blockchain communication
  • Dynamic key rotation via file-based keys without service restart
  • Thread-safe operations with optimized read-write mutex usage

βš–οΈ Accurate Token Metering (v2.0)

  • Model-specific tokenization using HuggingFace and SentencePiece libraries
  • 90-95% accuracy matching actual AI model token usage for billing-grade precision
  • Automatic model detection from JSON request body
  • Supported models: Llama 2/3/3.3, Mistral, Mixtral, Falcon, BERT, and custom HuggingFace models
  • Lazy-loading with caching - tokenizers load once and are reused for performance
  • Request/response tracking with comprehensive body analysis
  • Usage accumulation per API key and per model with thread-safe operations
  • Resilient reporting with retry logic and failed report persistence
  • Smart fallback to conservative estimation for unknown models

πŸ“Š Comprehensive Metrics

  • Real-time monitoring of requests, tokens, performance, and errors
  • HTTP metrics endpoint at /metrics with detailed JSON output
  • Cache performance tracking including hit rates and operation times
  • Token usage analytics with input/output token breakdowns
  • System health indicators for operational monitoring

πŸ’³ x402 Payment Protocol (Portal-Based)

  • Stateless proxy β€” Caddy delegates all balance management to DevPortal
  • Per-request balance check via DevPortal API with service-key authentication
  • 402 Payment Required responses with USDC-denominated challenge payloads and topup URLs
  • Async usage reporting β€” token counts sent to DevPortal without blocking the response
  • Fail-closed β€” returns 503 when DevPortal is unreachable (no unpaid usage)
  • Master key bypass β€” admin/operator keys skip portal balance checks
  • Configurable threshold β€” minimum balance required to serve requests (x402_min_balance_usdc)
  • Portal owns pricing β€” Caddy reports raw token counts, DevPortal computes cost

See x402_Caddy_Implementation.md for full documentation and x402-portal.md for the end-to-end payment flow design.

🚫 URL Filtering & Security

  • Pattern-based blocking with configurable URL patterns via environment variables
  • Early request filtering for performance optimization (before API key validation)
  • Comprehensive logging of blocked requests with pattern matching details
  • HTTP 403 Forbidden responses for blocked requests with clear error messages
  • Metrics integration for tracking blocked request statistics

πŸš€ Production Features

  • Environment variable support for secure configuration management
  • URL filtering with configurable blocked URL patterns via BLOCK_URLS environment variable
  • Graceful error handling with detailed logging and audit trails
  • Resource management with configurable limits and cleanup procedures
  • Docker-ready deployment with multi-stage builds and health checks

πŸ› οΈ Building and Testing

Prerequisites

  • Go 1.26+
  • Docker & Docker Compose
  • Git

Build Custom Caddy

The project uses a multi-stage Dockerfile to build Caddy with the custom module:

# Build the custom Caddy image
docker build -t secret-reverse-proxy:latest .

The Dockerfile:

  1. Builder Stage: Uses Go 1.26+ to install xcaddy and build Caddy with the secret-reverse-proxy module
  2. Runtime Stage: Creates lightweight Alpine-based runtime with security hardening
  3. Security Features: Non-root user, minimal dependencies, health checks

Test Environment Setup

1. Start Test Environment

# Start all services including echo server for testing
docker-compose up --build

The docker-compose setup includes:

  • caddy: Custom Caddy with secret-reverse-proxy module
  • echo-server: Simple HTTP echo service for testing backend responses
  • networking: Isolated testnet for secure communication

2. Test Different Scenarios

Valid API Key Test:

curl -H "Authorization: Bearer bWFzdGVyQHNjcnRsYWJzLmNvbTpTZWNyZXROZXR3b3JrTWFzdGVyS2V5X18yMDI1" \
     -H "Content-Type: application/json" \
     -d '{"model": "llama3.3:70b", "prompt": "Hello, world!", "max_tokens": 100}' \
     http://localhost:8085/
# Expected: 200 OK with accurate token count in logs

Invalid API Key Test:

curl -H "Authorization: Bearer invalid-key-123" \
     http://localhost:8085/
# Expected: 401 Unauthorized

Missing Authorization Test:

curl http://localhost:8085/
# Expected: 401 Unauthorized

Accurate Token Counting Test (Llama):

curl -H "Authorization: Bearer bWFzdGVyQHNjcnRsYWJzLmNvbTpTZWNyZXROZXR3b3JrTWFzdGVyS2V5X18yMDI1" \
     -H "Content-Type: application/json" \
     -d '{"model": "llama3.3:70b", "prompt": "Write a haiku about programming", "max_tokens": 100}' \
     http://localhost:8085/
# Uses Llama-3.3 tokenizer for accurate counting

Accurate Token Counting Test (Mistral):

curl -H "Authorization: Bearer bWFzdGVyQHNjcnRsYWJzLmNvbTpTZWNyZXROZXR3b3JrTWFzdGVyS2V5X18yMDI1" \
     -H "Content-Type: application/json" \
     -d '{"model": "mistral-7b", "messages": [{"role": "user", "content": "Explain quantum computing"}]}' \
     http://localhost:8085/chat/completions
# Uses Mistral tokenizer for accurate counting

Unknown Model Test (Fallback):

curl -H "Authorization: Bearer bWFzdGVyQHNjcnRsYWJzLmNvbTpTZWNyZXROZXR3b3JrTWFzdGVyS2V5X18yMDI1" \
     -H "Content-Type: application/json" \
     -d '{"model": "custom-gpt-x", "prompt": "Test prompt"}' \
     http://localhost:8085/
# Uses fallback chars/4 estimation for unknown models

Metrics Check:

curl http://localhost:8085/metrics

URL Filtering Test (Blocked):

# Set environment variable for URL blocking
export BLOCK_URLS="/admin,/config,/internal"

# This request will be blocked with HTTP 403
curl -H "Authorization: Bearer bWFzdGVyQHNjcnRsYWJzLmNvbTpTZWNyZXROZXR3b3JrTWFzdGVyS2V5X18yMDI1" \
     http://localhost:8085/admin/users

Configuration Details

The Caddyfile-test demonstrates comprehensive configuration:

:80 {
    secret_reverse_proxy {
        # Authentication configuration
        API_MASTER_KEY {env.SECRET_API_MASTER_KEY}
        master_keys_file /etc/caddy/master_keys.txt
        secret_node {env.SECRET_NODE}
        contract_address {env.SECRET_CONTRACT}
        secret_chain_id {env.SECRET_CHAIN_ID}
        # permit_file is optional if SECRETAI_PERMIT_TYPE, SECRETAI_PERMIT_PUBKEY,
        # and SECRETAI_PERMIT_SIG env vars are set instead
        permit_file /etc/caddy/permit.json

        # Metering configuration
        metering {env.METERING}
        metering_interval {env.METERING_INTERVAL}
        metering_url {env.METERING_URL}

        # Token counting settings (v2.0)
        max_body_size 2097152          # 2MB max body size
        token_counting_mode accurate   # Uses model-specific tokenizers
        tokenizer_cache_dir /tmp/tokenizers  # Cache directory for tokenizers
        preload_models llama-2,mistral # Pre-cache common models for fast startup

        # Reporting settings
        max_retries 5                  # retry attempts for failed reports
        retry_backoff 300s             # backoff between retries

        # Metrics configuration
        enable_metrics true            # enable /metrics endpoint
        metrics_path /metrics          # metrics endpoint path
    }

    reverse_proxy echo-server:80 {
        health_uri /health
        health_interval 30s
        health_timeout 10s
    }
}

x402 Payment Protocol Testing

The x402 test suite includes unit tests and integration tests with a mock DevPortal:

cd secret-reverse-proxy

# Run x402 unit tests (portal client, challenge builder, USDC conversion)
go test -v ./x402/

# Run x402 integration tests (full middleware flow with mock portal)
go test -v -run TestX402

The integration tests verify: insufficient balance (402), sufficient balance (200 + usage report), master key bypass, portal unreachable (503), and partial balance deficit calculation. See x402_Caddy_Implementation.md for the full test plan including Docker-based end-to-end testing.

Development Testing

Unit Tests

cd secret-reverse-proxy
go test -v ./...

Integration Tests

go test -v -tags=integration ./...

Test Coverage

go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out

Specific Component Tests

# Test API key validation
go test -v ./validators/

# Test x402 payment protocol
go test -v ./x402/...

# Test token counting
go test -v -run TestTokenCounter

# Test metering functionality
go test -v -run TestMetering

πŸ“‹ Configuration Reference

Environment Variables

Variable Description Example
SECRET_API_MASTER_KEY Primary API key for authentication your-secure-master-key
SECRET_NODE Secret Network LCD endpoint lcd.secret.tactus.starshell.net
SECRET_CHAIN_ID Secret Network chain identifier secret-4
SECRET_CONTRACT Smart contract address for validation secret18xpp2kmkk7g8xzx24wm5zjstw9tjv6g3xle2vjm
METERING Enable/disable usage metering 1 or true
METERING_INTERVAL Reporting interval 5m, 1h
METERING_URL Endpoint for usage reports https://api.example.com
BLOCK_URLS Comma-separated list of URL patterns to block /admin,/config,/internal
SECRETAI_MASTER_KEYS Comma-separated list of master API keys. Used as an alternative (or in addition) to master_keys_file. key1,key2,key3
SECRETAI_PERMIT_TYPE Permit public key type. Required when no permit_file is configured β€” used to construct a permit on the fly for retrieving API keys from KMS. tendermint/PubKeySecp256k1
SECRETAI_PERMIT_PUBKEY Permit public key value. Required when no permit_file is configured. Aur9D8RLq...
SECRETAI_PERMIT_SIG Permit signature. Required when no permit_file is configured. TeNtblPmo...
DEVPORTAL_URL DevPortal base URL for balance checks and usage reporting https://devportal.example.com
DEVPORTAL_SERVICE_KEY Shared secret for Caddy-to-DevPortal service authentication caddy-secret-key-123

Caddyfile Directives

Directive Type Description Default
API_MASTER_KEY string Primary master key None
master_keys_file path Path to a file containing additional master keys (one per line). Optional β€” if not configured, SECRETAI_MASTER_KEYS env var can be used instead. ""
permit_file path Path to a JSON file containing the Secret Network permit configuration used to retrieve Secret AI API Keys from KMS. Optional β€” if not configured, the system constructs a permit on the fly from SECRETAI_PERMIT_TYPE, SECRETAI_PERMIT_PUBKEY, and SECRETAI_PERMIT_SIG env vars. None
contract_address string Smart contract address Required
secret_node string Secret Network node Required
secret_chain_id string Chain ID Required
metering boolean Enable usage metering false
metering_interval duration Reporting frequency 10m
metering_url string Usage reporting endpoint ""
max_body_size bytes Max request body size 10MB
token_counting_mode string Token counting mode (always uses accurate v2.0) accurate
tokenizer_cache_dir path Directory for caching tokenizer files /tmp/tokenizers
preload_models string Comma-separated models to pre-cache llama-2,mistral
max_retries int Failed report retry attempts 3
retry_backoff duration Retry delay 5m
enable_metrics boolean Enable metrics collection false
metrics_path string Metrics HTTP endpoint /metrics
x402_enabled boolean Enable portal-based x402 payment protocol false
devportal_url string DevPortal base URL Required if x402 enabled
devportal_service_key string Shared secret for service-to-service auth Required if x402 enabled
x402_min_balance_usdc string Minimum agent balance in USDC (e.g., "0.01") Required if x402 enabled
x402_topup_url string Override topup URL in 402 responses {devportal_url}/api/agent/add-funds

πŸš€ Production Deployment

Docker Deployment

Basic Deployment

docker run -d \
  --name secret-ai-caddy \
  -p 80:80 -p 443:443 \
  -e SECRET_API_MASTER_KEY="your-production-key" \
  -e SECRET_NODE="lcd.secret.tactus.starshell.net" \
  -e SECRET_CONTRACT="secret18xpp2kmkk7g8xzx24wm5zjstw9tjv6g3xle2vjm" \
  -e SECRET_CHAIN_ID="secret-4" \
  -e METERING=true \
  -e METERING_INTERVAL="5m" \
  -e METERING_URL="https://your-metrics-api.com" \
  -e BLOCK_URLS="/admin,/config,/internal" \
  secret-reverse-proxy:latest

Production with Volumes

docker run -d \
  --name secret-ai-caddy \
  --restart unless-stopped \
  -p 80:80 -p 443:443 \
  -v ./Caddyfile:/etc/caddy/Caddyfile \
  -v ./master_keys.txt:/etc/caddy/master_keys.txt \
  -v ./permit.json:/etc/caddy/permit.json \
  -v caddy_data:/data \
  -v caddy_config:/config \
  -e SECRET_API_MASTER_KEY="your-production-key" \
  secret-reverse-proxy:latest

Docker Compose Production

version: '3.8'
services:
  secret-ai-caddy:
    image: secret-reverse-proxy:latest
    ports:
      - "80:80"
      - "443:443"
    environment:
      - SECRET_API_MASTER_KEY=${SECRET_API_MASTER_KEY}
      - SECRET_NODE=${SECRET_NODE}
      - SECRET_CONTRACT=${SECRET_CONTRACT}
      - SECRET_CHAIN_ID=${SECRET_CHAIN_ID}
      - METERING=true
      - METERING_INTERVAL=5m
      - METERING_URL=${METERING_URL}
      - BLOCK_URLS=${BLOCK_URLS}
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - ./master_keys.txt:/etc/caddy/master_keys.txt
      - ./permit.json:/etc/caddy/permit.json
      - caddy_data:/data
      - caddy_config:/config
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  caddy_data:
  caddy_config:

Security Best Practices

  1. API Key Security

    • Use environment variables for sensitive keys
    • Rotate master keys regularly
    • Implement key versioning
    • Monitor key usage patterns
  2. File Security

    • Secure master keys file with 600 permissions
    • Use separate permit files per environment
    • Regular backup of configuration files
  3. Network Security

    • Always use HTTPS in production
    • Implement proper firewall rules
    • Use private networks for backend communication
    • Enable rate limiting per API key
  4. Monitoring & Alerting

    • Monitor authentication failure rates
    • Set up alerts for contract query failures
    • Track unusual usage patterns
    • Monitor system resource usage
  5. Operational Security

    • Regular security updates
    • Log analysis and monitoring
    • Incident response procedures
    • Backup and recovery plans

πŸ” Monitoring & Troubleshooting

Health Checks

System Health:

curl http://localhost:8085/health

Metrics Overview:

curl http://localhost:8085/metrics | jq

Common Issues

  1. Authentication Failures

    # Check logs for details
    docker logs caddy-reverse-proxy
    
    # Verify environment variables
    docker exec caddy-reverse-proxy env | grep SECRET
  2. Contract Query Issues

    # Test network connectivity
    curl https://lcd.secret.tactus.starshell.net/status
    
    # Verify contract address
    curl "https://lcd.secret.tactus.starshell.net/compute/v1beta1/code_hash/by_contract_address/YOUR_CONTRACT"
  3. Token Counting Problems

    # Check metering logs for tokenizer loading
    docker logs caddy-reverse-proxy 2>&1 | grep -i tokenizer
    
    # Check for accurate token counting
    docker logs caddy-reverse-proxy 2>&1 | grep -i "Used accurate tokenizer"
    
    # Test with model-specific request
    curl -H "Authorization: Bearer YOUR_KEY" \
         -H "Content-Type: application/json" \
         -d '{"model": "llama3.3:70b", "prompt": "test"}' \
         http://localhost:8085/
    
    # Verify tokenizer cache
    docker exec caddy-reverse-proxy ls -la /tmp/tokenizers

    Common Token Counting Issues:

    • If seeing "Failed to load tokenizer" warnings, check network connectivity for HuggingFace downloads
    • Tokenizers are cached after first use - subsequent requests should be fast
    • Unknown models automatically fall back to conservative chars/4 estimation
    • Check preload_models configuration to pre-cache commonly used models
  4. URL Filtering Issues

    # Check if BLOCK_URLS is set
    docker exec caddy-reverse-proxy env | grep BLOCK_URLS
    
    # View filtering logs
    docker logs caddy-reverse-proxy 2>&1 | grep -i "blocked"
    
    # Test blocked URL
    curl -H "Authorization: Bearer YOUR_KEY" \
         http://localhost:8085/admin/test
    # Should return HTTP 403 Forbidden
    
    # Test allowed URL  
    curl -H "Authorization: Bearer YOUR_KEY" \
         http://localhost:8085/api/test
    # Should proceed to API key validation

Debug Configuration

{
    debug
    log {
        output stdout
        format console
        level DEBUG
    }
}

πŸ“Š Performance Characteristics

v2.0 Performance Metrics

  • Authentication Latency: <1ms for cache hits, <500ms for contract queries
  • Token Counting (Accurate Mode):
    • First request with model: 50-200ms (downloads and caches tokenizer from HuggingFace)
    • Cached tokenizer: 1-5ms per request (accurate tokenization)
    • Unknown model fallback: <1ms (simple chars/4 estimation)
    • Preloaded models: 1-5ms from first request
  • Memory Usage:
    • ~1KB per 1000 cached API keys
    • ~5-15MB per cached tokenizer (depends on model)
    • Typical deployment: 20-50MB for 2-3 common models
  • Throughput: Supports 10k+ RPS with proper caching
  • Cache Efficiency:
    • API keys: 95%+ hit rate for stable key sets
    • Tokenizers: 100% hit rate after initial load (cached permanently)

Token Counting Accuracy

Model Type Accuracy vs Actual Method
Llama 2/3/3.3 90-95% HuggingFace tokenizer
Mistral/Mixtral 90-95% HuggingFace tokenizer
Falcon 90-95% HuggingFace tokenizer
BERT 90-95% HuggingFace tokenizer
Unknown models 60-70% Chars/4 fallback

Before v2.0: Heuristic method inflated counts by 2-2.5x After v2.0: Within 5-10% of actual usage for supported models

πŸ”„ Migrating from v1.x to v2.0

What Changed

Token Counting System:

  • Old heuristic (chars/4 + wordsΓ—1.33)/2 replaced with model-specific tokenizers
  • Token counts will be 40-60% lower for most requests (more accurate)
  • Per-model usage tracking now available

Migration Steps

  1. Update Docker Image

    docker pull secret-reverse-proxy:latest
    # or rebuild: docker build -t secret-reverse-proxy:latest .
  2. Update Configuration (Optional)

    secret_reverse_proxy {
        # ... existing config ...
    
        # New optional settings (v2.0)
        tokenizer_cache_dir /tmp/tokenizers    # Default location
        preload_models llama-2,mistral          # Common models
    }
  3. Monitor First Deployment

    # Watch for tokenizer downloads (first time only)
    docker logs -f caddy-reverse-proxy | grep tokenizer
    
    # Verify accurate counting
    docker logs -f caddy-reverse-proxy | grep "Used accurate tokenizer"
  4. Expect Lower Token Counts

    • Old counts were inflated 2-2.5x
    • New counts are 90-95% accurate for supported models
    • Update billing expectations accordingly

Rollback Plan

If you need to rollback:

# Use previous image version
docker pull secret-reverse-proxy:v1.x
docker-compose up -d

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Implement changes with tests
  4. Update documentation
  5. Submit a pull request

πŸ“„ License

[Add appropriate license information]


πŸ†˜ Support

For issues and questions:

About

Secret AI Caddy Reverse Proxy

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors