AI API – Scalable LLM Backend with RAG & Agent Orchestration

Overview

Built a production-grade AI backend that enables intelligent query answering using RAG (Retrieval-Augmented Generation) and agent-based decision systems.

Designed to simulate a real-world enterprise AI service for knowledge retrieval, tool usage, and LLM observability.

Frontend: https://ai-by0z7njes-manibala-sinhas-projects-273c5a77.vercel.app/ Backend API: https://ai-api-6.onrender.com/

Key Impact

Reduced hallucinated responses by ~40–60% using RAG-based context injection
Achieved sub-second response latency (~400–600ms) for cached queries
Designed modular services enabling independent scaling of RAG, agent, and LLM layers
Implemented full tracing using LangSmith for debugging and performance monitoring

Core Features

FastAPI backend with clean service-layer architecture
RAG pipeline using vector embeddings (Chroma)
Intelligent agent routing:
- RAG retrieval
- Direct LLM response
- Tool/API invocation
Function calling support for dynamic workflows
Structured logging (latency, token usage, responses)
Observability with LangSmith (end-to-end trace visibility)
Unit-tested APIs using Pytest

System Design

Client Request
   ↓
FastAPI (/ask)
   ↓
Agent Decision Layer
   ↓
 ┌───────────────┬───────────────┐
 ↓               ↓               ↓
RAG Pipeline   Direct LLM     External Tools
 ↓
Vector DB (Chroma)
 ↓
Context Injection
 ↓
LLM Response
 ↓
Tracing + Logging
 ↓
API Response

Tech Stack

Backend: FastAPI (Python)
LLM Orchestration: LangChain
Model Provider: OpenAI
Vector Database: Chroma
Observability: LangSmith
Testing: Pytest

Engineering Highlights

Designed agent-based routing logic to dynamically select optimal execution path
Built retrieval pipeline with chunking + embedding + top-K similarity search
Applied prompt constraints to reduce hallucination and enforce structure
Implemented centralized logging + tracing for debugging LLM workflows
Structured backend into modular services for maintainability and scalability

Future Improvements

Redis caching for frequent queries
Streaming responses (token-level)
Pinecone / scalable vector DB integration
Authentication & rate limiting
Multi-agent workflows

Author

Manibala Sinha Senior Backend Engineer | Python | FastAPI | AI Systems

FrontEnd: https://ai-by0z7njes-manibala-sinhas-projects-273c5a77.vercel.app/ Backend: https://ai-api-6.onrender.com/

Key Features

FastAPI-based scalable backend
RAG pipeline using vector database (Chroma)
LangChain-powered agent orchestration
Function/tool calling support
Prompt engineering with hallucination control
Observability & tracing via LangSmith
Structured logging (latency, responses)
Modular and production-ready architecture
Unit testing support

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
ai-api		ai-api
app		app
auth		auth
image-caption-summarizer		image-caption-summarizer
models		models
templates		templates
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
index.html		index.html
main.py		main.py
render.yaml		render.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI API – Scalable LLM Backend with RAG & Agent Orchestration

Overview

Key Impact

Core Features

System Design

Tech Stack

Engineering Highlights

Future Improvements

Author

FrontEnd: https://ai-by0z7njes-manibala-sinhas-projects-273c5a77.vercel.app/ Backend: https://ai-api-6.onrender.com/

Key Features

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI API – Scalable LLM Backend with RAG & Agent Orchestration

Overview

Key Impact

Core Features

System Design

Tech Stack

Engineering Highlights

Future Improvements

Author

FrontEnd: https://ai-by0z7njes-manibala-sinhas-projects-273c5a77.vercel.app/ Backend: https://ai-api-6.onrender.com/

Key Features

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages