Skip to content

richardhsuuuu/reverse_proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Reverse Proxy Implementation

A robust reverse proxy server implementation in Python with load balancing, caching, and SSL termination capabilities.

Main Questions and Answers

How can someone get started with your codebase?

  • Refer to the "Quick Start" section. You will need to set up a virtual environment (or use the provided one for convenience), run multiple backend servers (Flask) on different specified ports, and then execute the reverse_proxy.
  • You can test the setup using CURL or the load-testing tool I created.
  • If you pass in the debug param, you will notice in the response output, there's a "port" which indicate which host it's hitting from BE, and that will keep rotating due to the round-robhin Load Balancing I implemented.

What resources did you use to build your implementation?

  • I utilized a GenAI tool to code up more implementation after i decide on the high-level features i must implement
  • My approach involved looking at each aspect of reverse proxy functionality, selecting at least one key feature to implement, and gradually adding features and robustness. I concluded by developing a load testing tool to ensure everything functions correctly.
  • The most intriguing part was the health check and auto-discovery feature of the LoadBalancer, which I specifically separated. This feature allows hosts to dynamically join or leave, automatically updating the pool of available hosts in a separate health-checking thread.

Explain any design decisions you made, including limitations of the system

  • The system is quite robust. Key decisions include:
    1. Implementing a Cache (LRU) to efficiently utilizing hosts.
    2. Enabling auto-discovery of hosts and automated fail-over to different hosts, only failing requests when the retry count exceeds the internal limit. This is what i'm prob most proud of, as this is a key differentiator of reverse proxies.
  • I focused on implementing as many features and testing it live instead of writing unit-tests, i would add unit-tests if we need to bring this online of course.
  • Limitation include:
    1. No unit-tests (i tested everything locally on all features, just didnt' get to writing tests)
    2. No sticky session feature implemented yet
    3. Only support Round-robin, but obviously can support other types of algos like weighted round-robin, Dynamic Load Balancing upon CPU/Mem, Least connection, etc.
    4. Need a proper cert file ofc
    5. Need more work to support horizontal-scaling of the same code, and need to support Zookeeper for host management.

How would you scale this?

  • I would introduce a dedicated database for comprehensive logging of requests and responses for tracing.
  • Enhance security by incorporating JWT tokens.
  • Implement sticky sessions, least connection if required.
  • Add a Rate-Limiter for better control.
  • The LRUCache can likely be a dedicated RedisCache cluster
  • If too much traffic come in, I would do 1) Vertical Scaling first - increase CPU/memory and implement least_connection load balance algo, then 2) Horizontal scaling - deploy this onto different hosts, and at the same time, start using Zookeeper to separate out host discovery and health checks to scale the host management process better. It is also possible to add more load-balancer in front (Amazon does this, where they maintain more load-balancer before the reverse-proxy, so by the time it reaches the reverse-proxy, the load is already managed).
  • If the CPU becomes the bottleneck (due to SSL/TLS handshake), I would do crypto offload to a dedicated machine that has access to the private key, e.g. https://docs.aws.amazon.com/cloudhsm/latest/userguide/ssl-offload.html

How would you make it more secure?

  • Introduce additional authentication methods, such as JWT tokens.
  • Implement a whitelist/blacklist mechanism for access control.
  • Provision a proper cert.

Key Features

  • Security: Hides backend servers from direct internet access
  • Load Balancing: Distributes traffic across multiple backend servers
  • Caching: Caches responses to reduce load on backend servers
  • SSL Termination: Handles HTTPS encryption/decryption
  • Health Checks: Monitors backend server health and routes traffic accordingly
  • Compression: Supports gzip, brotli and deflate compression

Quick Start

Prerequisites

  1. Python 3.6+
  2. Clone this repository
  3. Set up Python environment
python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt

Example commands to spin up multiple web server backends

python3 backend_server/backend_server.py 8000 --debug
python3 backend_server/backend_server.py 8001 --debug
python3 backend_server/backend_server.py 8002 --debug
python3 backend_server/backend_server.py 8003 --debug

Main command to spin up reverse proxy

python3 reverse_proxy/reverse_proxy.py --debug

Example CURL commands to test the sample backend

curl -k -X POST https://localhost:8443/test \
-H "X-API-Key: test-api-key-123" \
-H "Content-Type: application/json" \
-d '{"hello": "world"}'
curl -k https://localhost:8443/test \
-H "X-API-Key: test-api-key-123" \
-H "Content-Type: application/json"

Example command to execute load testing with POST

python3 client_load_test.py --rps 100 --duration 10

Configurations

VALID_API_KEY = "test-api-key-123" # Change in production

Caching

cache_capacity = 1000 # Number of cache entries cache_ttl = 300 # Cache TTL in seconds

Health Checks

check_interval = 1 # Health check frequency in seconds max_failures = 3 # Failures before marking as unhealthy

Load Balancing

max_retries = 2 # Maximum request retries

Core Features

Security πŸ”’

  1. SSL/TLS Support

    • Full SSL/TLS encryption for both incoming and outgoing connections
    • Configurable certificate and private key paths
    • Self-signed certificate generation for development
    • Support for custom SSL contexts and verification modes
  2. API Key Authentication

    • Required API key validation for all requests
    • Configurable API key through VALID_API_KEY constant
    • Returns 401 Unauthorized for invalid or missing API keys
  3. Request Headers

    • Secure header handling with filtering of hop-by-hop headers
    • X-Forwarded-* headers for maintaining client information
    • X-Backend-Server header for request tracing

Load Balancing βš–οΈ

  1. Round-Robin Algorithm

    • Thread-safe implementation using locks
    • Automatic server rotation
    • Skips unhealthy backends
    • Configurable retry mechanism for failed requests
  2. Backend Management

    • Dynamic backend server pool
    • Configurable through BACKEND_URLS
    • Support for multiple backend instances
    • Backend status tracking and monitoring
  3. Request Distribution

    • Even distribution across healthy backends
    • Automatic failover on backend failures
    • Request retry support with configurable attempts
    • Debug mode for monitoring distribution patterns

Caching πŸ“¦

  1. LRU Cache Implementation

    • Least Recently Used (LRU) caching strategy
    • Configurable cache capacity
    • Time-based cache expiration (TTL)
    • Support for varied content encodings
  2. Cache Keys

    • Unique key generation based on:
      • Request method
      • Path
      • Relevant headers
      • Request body
      • Content encoding
  3. Cache Control

    • Cache hit/miss headers
    • Automatic cache invalidation
    • Cache bypass for non-GET requests
    • TTL-based entry expiration

Health Checks πŸ’“

  1. Active Monitoring

    • Background health check thread
    • Configurable check intervals
    • Failure threshold tracking
    • Automatic recovery detection
  2. Health Status Management

    • Three-state backend status:
      • NOT_INITIATED
      • HEALTHY
      • UNREACHABLE
    • Configurable failure thresholds
    • Last healthy timestamp tracking
  3. Debug Monitoring

    • Real-time backend status display
    • Health check statistics
    • Failure count tracking
    • Visual status dashboard in debug mode

Debug Output

When running in debug mode, you'll see a real-time dashboard:


=== Backend Server Status ===
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€-──┬───────────┬──────────────────┐
β”‚ Backend URL            β”‚ Status        β”‚ Failures  β”‚ Last Healthy     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€-─┼───────────┼───────────────────
β”‚ https://127.0.0.1:8000 β”‚ HEALTHY       β”‚     0     β”‚     12:34:56     β”‚
β”‚ https://127.0.0.1:8001 β”‚ UNREACHABLE   β”‚     3     β”‚     12:30:00     β”‚
└────────────────────────┴─────────-β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

How I would productionize this/Scale this

  1. Security

    • Replace self-signed certificate with proper SSL certificate
    • Implement JWT token at-least
  2. Performance

    • Adjust cache capacity based on memory availability
    • Experiment with different configuration of each parameters
  3. Cache

    • Monitor cache hit rates and compression ratios
  4. Scaling

    • Add more backend servers as needed
    • Adjust load balancing strategy for your use case
    • Consider implementing sticky sessions if needed

About

A simple reverse proxy with robust features - automatic host discovery, self-healing, automated retries, caching, round-robin

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages