A comprehensive guide to the 153 LLM framework repositories now tracked in this collection.
Run the automation script to star all frameworks:
./star_llm_frameworks_repos.shThe foundational frameworks for building LLM applications:
| Framework | Language | Best For | GitHub |
|---|---|---|---|
| LangChain | Python/JS/Java | General-purpose LLM apps | langchain-ai/langchain |
| LangGraph | Python | Stateful multi-agent systems | langchain-ai/langgraph |
| LlamaIndex | Python | RAG and data-centric apps | run-llama/llama_index |
| Haystack | Python | Production NLP pipelines | deepset-ai/haystack |
| Semantic Kernel | C#/Python/Java | Enterprise .NET integration | microsoft/semantic-kernel |
| Pydantic-AI | Python | Type-safe agents | pydantic/pydantic-ai |
| DSPy | Python | Programming LLMs, not prompting | stanfordnlp/dspy |
| LiteLLM | Python | Unified API for 100+ LLMs | BerriAI/litellm |
LangChain Ecosystem:
langchain- Core frameworklanggraph- Stateful agent graphslangsmith-sdk- Observabilitylangserve- Deploy as REST APIs
LlamaIndex Ecosystem:
llama_index- Core data frameworkllama-hub- Data loaders libraryllama_deploy- Deploy workflows as services
Multi-Language Support:
langchainjs- TypeScript/JavaScriptlangchain4j- Java
Multi-agent systems and autonomous AI:
| Framework | Type | Description |
|---|---|---|
| AutoGen | Multi-agent | Microsoft's conversation framework |
| CrewAI | Multi-agent | Orchestrate autonomous agents |
| MetaGPT | Multi-agent | Software company simulation |
| SuperAGI | Autonomous | Dev-first agent framework |
| BabyAGI | Task Management | AI-powered task system |
| AutoGPT | Autonomous | Original autonomous GPT-4 |
| AgentGPT | Web-based | Browser-based agent deployment |
| TaskWeaver | Code-first | Microsoft's planning/execution |
| Voyager | Embodied | LLM-powered lifelong learning |
| ChatDev | Collaborative | AI software development team |
Notable Repos:
microsoft/JARVIS- Connect LLMs with ML communityOpenBMB/XAgent- Autonomous complex tasksaiwaves-cn/agents- Open-source agent frameworkmodelscope/agentscope- Multi-agent platformlangchain-ai/open-canvas- Collaborative AI canvasysymyth/ReAct- Reasoning and Acting with LLMs
Retrieval-Augmented Generation systems:
| Framework | Focus | Key Feature |
|---|---|---|
| RAGFlow | Deep Understanding | Open-source RAG engine |
| Embedchain | Quick Setup | Framework for RAG apps |
| PrivateGPT | Privacy | Chat with docs locally |
| LocalGPT | Local-first | No internet required |
| Quivr | Second Brain | GenAI knowledge base |
| RAGatouille | ColBERT | Easy RAG with reranking |
| RAGAS | Evaluation | RAG pipeline testing |
Essential Tools:
langchain-ai/rag-from-scratch- RAG tutorialsneuml/txtai- All-in-one embeddings DBmem0(Embedchain) - Memory layer for AIaurelio-labs/semantic-router- Semantic routingjina-ai/jina- Multimodal AI services
Type-safe LLM interactions:
| Tool | Approach | Use Case |
|---|---|---|
| Instructor | Pydantic models | Structured LLM outputs |
| Outlines | Constrained generation | Guaranteed valid outputs |
| Guidance | Control language | Interleave logic/generation |
| TypeChat | TypeScript types | Typed JSON responses |
| Mirascope | Type hints | Structured prompting |
| Marvin | Python decorators | AI-powered functions |
| Guardrails | Validators | Output safety checks |
Specialized languages for LLM control:
- LMQL - Query language for LLMs with constraints
- Guidance - Microsoft's control language
- Prompty - Prompt engineering asset class
- AICI - Control generation with WebAssembly
- Guardrails - Validator/corrector framework
Production monitoring and debugging:
| Platform | Type | Features |
|---|---|---|
| LangFuse | Open-source | Prompt versioning, tracing, analytics |
| Phoenix | ML Observability | Embeddings, LLM monitoring |
| DeepEval | Testing | Unit tests for LLM outputs |
| UpTrain | Evaluation | Open-source eval tool |
| TruLens | Tracking | LLM app evaluation |
| Helicone | Open-source | LLM observability platform |
| OpenLIT | OpenTelemetry | Native LLM observability |
| Lunary | Production | LLM toolkit |
Also includes: WhyLogs, MLflow, Weights & Biases
Frameworks for LLM quality assurance:
- OpenAI Evals - Official evaluation framework
- Anthropic Evals - Claude evaluation tools
- DeepEval - Unit testing for LLMs
- RAGAS - RAG pipeline evaluation
- PromptBench - Unified LLM benchmarks
- EleutherAI/lm-evaluation-harness - Language model eval
- Hugging Face Evaluate - Evaluation library
- Vectara Hallucination Leaderboard - Hallucination benchmarks
Visual and code-based workflow builders:
| Platform | Type | Description |
|---|---|---|
| LangFlow | Visual | Drag-and-drop LLM workflows |
| Flowise | Visual | Open-source workflow builder |
| Dify | Platform | LLM app development platform |
| AnythingLLM | Workspace | All-in-one LLM workspace |
| n8n | Automation | Workflow automation with LLMs |
Also includes: Prefect, DeepLake, Cheshire Cat AI, Vercel AI
Enable LLMs to use external tools:
- OpenAI SDK - Official tools/function calling
- Anthropic SDK - Claude tools
- E2B - Secure sandboxes for agents
- OpenHands - Software development agents
- Composio - Integration platform for agents
- Toolhouse - Universal tool infrastructure
Long-term memory for AI assistants:
- Mem0 - Memory layer for applications
- Zep - Long-term memory store
- MemGPT - Self-editing memory
- Langroid - Multi-agent framework with memory
- LlamaAgents - Agent orchestration
Curated collections and learning resources:
Shubhamsaboo/awesome-llm-apps- LLM applicationskyrolabs/awesome-langchain- LangChain resourcessteven2358/awesome-generative-ai- Generative AIf/awesome-chatgpt-prompts- Prompt examplese2b-dev/awesome-ai-agents- AI agentstensorchord/Awesome-LLMOps- LLMOps resourceskrishnaik06/Complete-LangChain-Tutorials- Tutorialsgkamradt/langchain-tutorials- LangChain guidespinecone-io/examples- Vector DB examples
Platforms for building LLM apps without extensive coding:
- LangFlow - Visual workflow builder
- Flowise - Open-source alternative
- Dify - Full platform
- AnythingLLM - Private workspace
- Griptape - Python workflows
- FastGPT - Knowledge base platform
- Start with LangChain tutorials
- Explore LangFlow for visual building
- Try AnythingLLM for a complete workspace
- LlamaIndex - Data-centric approach
- RAGFlow - Deep understanding
- RAGAS - Evaluate your RAG pipeline
- LangGraph - Stateful agent graphs
- AutoGen - Multi-agent conversations
- CrewAI - Orchestrate agent teams
- Instructor - Pydantic models
- Outlines - Constrained generation
- Pydantic-AI - Type-safe agents
- LangFuse - Observability
- DeepEval - Testing
- LiteLLM - Unified API
| Use Case | Recommended Framework | Why? |
|---|---|---|
| General LLM apps | LangChain | Most comprehensive, largest community |
| Stateful agents | LangGraph | Built for complex agent workflows |
| RAG/search | LlamaIndex | Best data connectors & indexing |
| Enterprise .NET | Semantic Kernel | Native Microsoft integration |
| Type safety | Instructor/Pydantic-AI | Guaranteed structured outputs |
| Production NLP | Haystack | Enterprise-grade features |
| Research/experiments | DSPy | Programming over prompting |
| Multi-model | LiteLLM | 100+ LLM providers |
From industry research:
- Lowest overhead: DSPy (~3.5ms), Haystack (~5.9ms), LlamaIndex (~6ms)
- Higher overhead: LangChain (~10ms), LangGraph (~14ms)
- Token efficiency: Haystack (~1.57k), LlamaIndex (~1.60k), LangChain (~2.40k)
- Star all repos with
./star_llm_frameworks_repos.sh - Complete LangChain tutorials
- Build a simple chatbot
- Study LlamaIndex documentation
- Build a RAG application
- Evaluate with RAGAS
- Learn LangGraph for stateful agents
- Try AutoGen multi-agent conversations
- Experiment with CrewAI
- Set up LangFuse for observability
- Add DeepEval tests
- Deploy with LangServe or BentoML
krishnaik06/Complete-LangChain-Tutorialsgkamradt/langchain-tutorialsdeepset-ai/haystack-tutorialslangchain-ai/rag-from-scratch
- Start simple - Use high-level frameworks first (LangChain, LlamaIndex)
- Prototype fast - Try visual builders (LangFlow, Flowise)
- Type safety - Add Instructor/Outlines for production
- Observe everything - Set up LangFuse from day one
- Test early - Use DeepEval for unit testing
- Observability (LangFuse/Phoenix)
- Testing framework (DeepEval)
- Type safety (Instructor/Pydantic-AI)
- Error handling (Guardrails)
- Cost tracking (LiteLLM)
- Evaluation (RAGAS for RAG)
Simple Chatbot:
LangChain + OpenAI/Anthropic SDK + LangServe
RAG Application:
LlamaIndex + Vector DB + RAGAS (evaluation)
Multi-Agent System:
LangGraph + LangSmith (observability) + DeepEval (testing)
Type-Safe Production App:
Instructor + Pydantic-AI + LangFuse + Guardrails
- Run the script:
./star_llm_frameworks_repos.sh - Pick a framework based on your use case
- Build a simple project to learn
- Add observability with LangFuse
- Share your learnings with the community
- Total Frameworks: 153 repositories
- Core Frameworks: 15 repos
- Agent Systems: 20 repos
- RAG Tools: 18 repos
- Type Safety: 12 repos
- Observability: 15 repos
- Testing/Eval: 15 repos
- Workflow Tools: 12 repos
- Learning Resources: 12 repos
Last Updated: November 18, 2025
Automation Script: star_llm_frameworks_repos.sh
See Also: PROMPT_LINTING_GUIDE.md for prompt engineering best practices