✅ Memory Optimized for Free Tier: This project now uses OpenAI API embeddings instead of local sentence-transformers, reducing memory usage from 600MB+ to ~200MB. Fully compatible with Render's 512MB free tier!
Steps:
- Go to render.com and sign up
- Click "New +" → "Web Service"
- Connect your GitHub account and select
FastAPI_RAG_Gateway - Render will auto-detect
render.yaml - Add environment variables in the dashboard:
OPENAI_API_KEY= your OpenAI keyOPENROUTER_API_KEY= your OpenRouter key
- Click "Create Web Service"
- Wait 5-10 minutes for deployment
Your API will be live at: https://fastapi-rag-gateway.onrender.com
Note: Free tier sleeps after 15 minutes of inactivity. First request after sleep takes ~30 seconds.
Steps:
-
Install Railway CLI:
npm i -g @railway/cli
-
Login and initialize:
railway login railway init
-
Set environment variables:
railway variables set OPENAI_API_KEY="your_key" railway variables set OPENROUTER_API_KEY="your_key" railway variables set RAG_LLM_MODEL="deepseek/deepseek-chat" railway variables set RAG_LLM_BASE_URL="https://openrouter.ai/api/v1"
-
Deploy:
railway up
-
Get your URL:
railway domain
Steps:
-
Install Fly CLI:
curl -L https://fly.io/install.sh | sh -
Login:
fly auth login
-
Launch (from project directory):
fly launch
- Choose a unique app name
- Select region closest to you
- Don't deploy yet (say No)
-
Set secrets:
fly secrets set OPENAI_API_KEY="your_key" fly secrets set OPENROUTER_API_KEY="your_key"
-
Deploy:
fly deploy
Your API will be live at: https://your-app-name.fly.dev
Build and run locally:
docker build -t fastapi-rag-gateway .
docker run -p 8000:8000 \
-e OPENAI_API_KEY="your_key" \
-e OPENROUTER_API_KEY="your_key" \
-e RAG_LLM_MODEL="deepseek/deepseek-chat" \
-e RAG_LLM_BASE_URL="https://openrouter.ai/api/v1" \
fastapi-rag-gatewayDeploy to any cloud with Docker support (AWS ECS, Google Cloud Run, Azure Container Instances, DigitalOcean App Platform)
Once deployed, test with:
curl -X POST "https://your-deployment-url.com/query" \
-H "Content-Type: application/json" \
-d '{"question": "What is machine learning?"}'Or visit https://your-deployment-url.com/docs for interactive API documentation.
| Platform | Free Tier | Paid | Pros | Cons |
|---|---|---|---|---|
| Render | 750hrs/mo | $7/mo | Easy, auto-deploy from GitHub | Sleeps after 15min inactivity |
| Railway | $5/mo | Fast, no sleep, great DX | No free tier | |
| Fly.io | $5 credit | ~$3/mo | Fast, global edge network | Complex pricing |
| Docker | Self-hosted | Varies | Full control | Requires infrastructure |
-
API Keys Security: Never commit
.envto GitHub. Use platform secret managers. -
ChromaDB Persistence:
- On first deployment, documents will be indexed automatically
- Vector store is ephemeral on free tiers (rebuilds on restart)
- For production, use persistent volumes or external vector DB
-
Cold Starts:
- Free tiers may have 30-60s cold start times
- Paid tiers have instant wake-up
-
Memory Requirements:
- Optimized for 512MB RAM (Render free tier compatible)
- Uses OpenAI API embeddings instead of local models
- Memory footprint: ~150-250MB (well within free tier limits)
Deployment fails with "Out of Memory":
- ✅ FIXED: Now uses OpenAI embeddings (API-based, minimal memory)
- Previous issue: sentence-transformers used 300MB+ RAM
- Current solution: <50MB for embeddings, fits in 512MB free tier
"OPENAI_API_KEY not found" error:
- Ensure environment variables are set in platform dashboard
- Check spelling and no extra spaces
Slow first query:
- Normal on free tiers (cold start)
- Consider paid tier or keep-alive pings
- Add authentication (API keys, OAuth)
- Implement rate limiting
- Add monitoring (Sentry, LogTail)
- Set up CI/CD with GitHub Actions
- Add more documents to
/datafolder