The sovereign agent stack — practical scripts, on-chain identity, and knowledge graphs for AI agents that think, remember, and own themselves.
A curated list covering the full AI agent ecosystem: frameworks, coding agents, MCP tooling, knowledge graphs, blockchain identity, decentralized finance agents, quantitative trading, and observability. What makes this list unique is the combination of practical developer tooling with on-chain identity and memory infrastructure — resources no other awesome list brings together.
Note: Some resources are intentionally listed in multiple sections when they are core to more than one workflow domain (for example, prompt + eval, or coding + CLI usage).
- Agent Frameworks
- Coding Agents
- Voice and Multimodal Agents
- Hermes Stack
- CLI and TUI Tools
- Agent Runtime Infrastructure
- MCP Ecosystem
- Prompt Engineering
- Agent Harnessing and Evaluation
- ArXiv Deep Research Map
- Context Engineering
- Neural Networks and Neural Linking
- Obsidian Vault Architecture for Agents
- Agent Security and Robustness
- Agent Configs and Dotfiles
- Skill Engineering and Playbooks
- Knowledge Graphs and Memory
- Solana Agent Infrastructure
- Agent Identity and Wallets
- Agent Payments
- DeFi Agents
- Quant and Trading Agents
- Agent Observability and Testing
- Research Papers
- Communities
Multi-agent orchestration, single-agent SDKs, and runtime frameworks.
- AG2 - Open-source AgentOS for building multi-agent systems (evolved from AutoGen).
- Agno - Framework for building and running agentic software at scale.
- AutoGen - Multi-agent conversation framework from Microsoft Research.
- Claude Agent SDK - Official Python SDK for building agents with Claude models.
- CrewAI - Role-based multi-agent orchestration framework.
- ElizaOS - Multi-agent simulation framework for autonomous characters.
- Google A2A - Agent-to-Agent protocol for cross-framework agent communication.
- Google ADK - Agent Development Kit for building agents with Gemini.
- Haystack - LLM orchestration framework for building search and RAG pipelines.
- Hermes Agent - Tool-using autonomous agent platform with memory, skills, delegation, and MCP support.
- Julep - Stateful agent platform with built-in persistence and task workflows.
- LangChain - Composable framework for building LLM-powered applications.
- LangGraph - Library for building stateful multi-agent workflows as graphs.
- Letta - Stateful agents with long-term memory (formerly MemGPT).
- LlamaIndex - Data framework for document agents, retrieval, and workflow orchestration.
- Magentic-One - Multi-agent team for complex web and file tasks.
- Mastra - TypeScript framework for building AI applications and agents.
- Microsoft Agent Framework - Framework for building, orchestrating, and deploying agents with Python and .NET support.
- OpenAI Agents SDK - Official SDK for building agents with OpenAI models.
- OpenClaw - Self-hosted personal AI agent with multi-platform messaging and skill registry.
- Phidata - Toolkit for building AI assistants with memory and tools.
- PydanticAI - Type-safe agent framework built around Pydantic.
- Rig - Rust framework for building LLM-powered applications.
- Semantic Kernel - SDK for integrating LLMs into apps with plugin architecture.
- Smolagents - Lightweight agent framework from Hugging Face.
- Swarm - Educational framework for multi-agent handoffs and routines.
AI agents that write, review, and debug code.
- Aider - AI pair programming in the terminal with git integration.
- Claude Code - Anthropic's agentic CLI for code generation and editing.
- Cline - Autonomous coding agent for VS Code with tool use.
- Continue - Open-source AI code assistant for VS Code and JetBrains.
- Cursor - AI-first code editor built on VS Code.
- Devin - Autonomous software engineering agent by Cognition.
- Goose - Autonomous developer agent from Block.
- OpenCodex - OpenAI's CLI coding agent.
- OpenHands - Platform for AI software development agents (formerly OpenDevin).
- SWE-Agent - Agent for automatically resolving GitHub issues.
- Windsurf - AI-native IDE by Codeium with agentic flows.
- awesome-claude-code - Curated list of Claude Code resources.
- Claude Code Hooks - Event-driven shell command automation.
- Claude Code Skills - Reusable prompt-driven workflows.
- CLAUDE.md Guide - Official documentation on memory files.
- claude-code-tips - Community-sourced tips and tricks.
- Everything Claude Code - Comprehensive Claude Code harness with agent skills, hooks, and multi-language support.
- Codex Docs - Official Codex documentation hub.
- Codex CLI - Guide to local Codex CLI workflows.
- Codex Non-Interactive Mode - Batch and CI automation with
codex exec. - AGENTS.md Guide (Codex) - Instruction hierarchy and scoping patterns for Codex.
- Codex Optimization Playbook (this repo) - Practical operator patterns for speed, safety, and quality.
Agents with voice, vision, and multimodal capabilities.
- ElevenLabs - Text-to-speech and voice cloning API for agent voice interfaces.
- LiveKit Agents - Framework for building real-time multimodal AI agents.
- Pipecat - Framework for building voice and multimodal conversational agents.
- TEN Framework - Open-source framework for conversational voice AI agents.
- Ultravox - Fast multimodal LLM for real-time voice AI.
- Vapi - Platform for building and deploying voice AI agents.
- Vocode Core - Modular open-source framework for building voice-based LLM agents.
- Whisper - Open-source speech recognition model from OpenAI.
Hermes Agent runtime, deployment rails, and operator resources.
- Hermes Agent - Open-source autonomous AI agent with CLI, gateway, memory, subagents, and broad tool integrations.
- Hermes Hub (this repo) - Local operator knowledge base for Hermes setup, configuration, memory/skills workflows, and contribution orientation.
- Hermes Agent + hermes-fly Best Practices (this repo) - Practical setup, operations, security, and optimization playbook.
- Hermes Agent Optimization Playbook (this repo) - Deep operator guide for context, delegation, memory, and execution tuning.
- Hermes Agent Self-Evolution - Evolutionary self-improvement framework for optimizing Hermes Agent prompts, skills, and code.
- Hermes Paperclip Adapter - Adapter for running Hermes Agent as a managed employee inside Paperclip.
- hermes-fly - Fly.io deployment and operations CLI for Hermes Agent with deploy, logs, doctor, and teardown workflows.
- Hermes Stack Maturity Ladder (this repo) - L1-L3 readiness model with upgrade paths and operational checklist.
- Hermes Stack Quickstart Recipes (this repo) - Copy/paste recipes for local dev, hosted production, secure mode, and CI operations.
Terminal-based agent interfaces and developer tools.
- Claude Code - Agentic CLI that operates directly in the terminal.
- Gemini CLI - Google's command-line interface for Gemini models.
- Glow - Terminal Markdown renderer useful for agent output.
- Hermes Agent - CLI and gateway agent runtime with tools, memory, delegation, and automation support.
- hermes-fly - CLI wizard to deploy and operate Hermes Agent on Fly.io.
- lazygit - Terminal UI for git commonly paired with coding agents.
- llm - CLI tool for interacting with LLMs from the terminal.
- aichat - All-in-one LLM CLI with chat, shell assistant, RAG, and agent features.
- OpenCodex - Lightweight CLI coding agent from OpenAI.
- sgpt - Command-line productivity tool powered by LLMs.
- tmux - Terminal multiplexer for running agents in persistent sessions.
- Warp - Modern terminal with built-in AI assistance.
- Zellij - Terminal workspace with plugin system for agent integration.
Execution sandboxes and runtime platforms for safely running agent actions and generated code.
- CUA - Open-source infrastructure for computer-use agents with sandboxes, SDKs, and benchmarks.
- Daytona - Secure and elastic runtime infrastructure for AI-generated code execution.
- E2B - Open-source secure cloud sandbox environment for AI agents.
- Firecracker - Secure and fast microVM technology for isolated agent execution.
- gVisor - Application kernel for containers that adds a strong isolation boundary.
- Kata Containers - Lightweight VM-based container runtime for stronger workload isolation.
- Modal - Serverless compute platform often used for running agent workloads and tools.
- RunPod Python SDK - Python SDK for RunPod serverless and worker-based AI workloads.
Model Context Protocol servers, clients, and tooling.
- Awesome MCP Servers - Curated list of MCP server implementations.
- Chrome DevTools MCP - Official Chrome DevTools MCP server for coding and browser automation agents.
- FastMCP - Pythonic framework for building MCP servers and clients quickly.
- GitHub MCP Server - Official MCP server for GitHub workflows and repository actions.
- mcp - Official reference MCP server implementations.
- MCP Agent - Framework patterns for building agents on top of MCP.
- MCP for Beginners - Cross-language curriculum and practical examples for learning MCP.
- MCP Go SDK - Go implementation of the Model Context Protocol.
- MCP Inspector - Official inspector and debugging tool for MCP servers.
- MCP Python SDK - Official Python SDK for building MCP servers.
- MCP Registry - Community registry service for discovering MCP servers.
- MCP Rust SDK - Official Rust SDK for building MCP servers.
- MCP Spec - Official Model Context Protocol specification.
- MCP Specification Repo - Canonical specification and documentation repository.
- MCP TypeScript SDK - Official TypeScript SDK for building MCP servers.
- Playwright MCP - MCP server for browser automation via Playwright.
- Smithery - Registry and hosting platform for MCP servers.
Instruction-writing craft: system prompts, response framing, and reusable prompt templates. Focus here on what to ask and how to phrase it at the prompt layer.
- Anthropic Prompt Library - Official prompt examples from Anthropic.
- awesome-chatgpt-prompts - Collection of prompt examples for ChatGPT.
- Claude System Prompts - Guide to writing effective system prompts.
- OpenAI Prompt Engineering Guide - Official guide to designing reliable prompts and instruction patterns.
- DSPy - Framework for programming with foundation models instead of prompting.
- fabric - Framework for augmenting humans using AI with curated prompts.
- LangChain Hub - Community-driven prompt and chain sharing platform.
- Promptfoo - Testing and evaluation framework for LLM prompts.
- System Prompts - Collection of system prompts for various AI models.
Harnesses, benchmarks, and evaluation frameworks for measuring agent quality and reliability.
-
MCPMark (paper) - 127-task MCP benchmark; reports best pass@1 at 52.56% (gpt-5-medium), with several strong models below 30% pass@1.
-
MCPMark (leaderboard) - Live model comparisons for realistic MCP task execution.
-
τ-bench - Tool-agent-user benchmark; reports strong function-calling agents still below 50% task success in its setup.
-
OSWorld - Open-ended computer-use benchmark; reports best model 12.24% vs 72.36% human success in initial results.
-
WebArena - Realistic web-task benchmark; reports best GPT-4-based agent at 14.41% vs 78.24% human.
-
GAIA - General assistant benchmark; original framing reports large human-model gap on tool-heavy questions.
-
AgentBench - Multi-domain benchmark suite for evaluating LLMs as agents.
-
AgentEvals - Evaluation utilities for scoring agent trajectories and outcomes.
-
AutoGen agbench - Benchmark runner for AutoGen agent workflows.
-
BrowserGym - Gym-style environment for training and evaluating browser agents.
-
browser-use - Framework for browser task automation and agent web interaction loops.
-
Inspect AI - Open-source framework for reproducible LLM and agent evaluations.
-
JailbreakBench - Open robustness benchmark for measuring jailbreak resistance in language models and agents.
-
MCPMark - Stress-testing benchmark for evaluating model and agent capability on MCP tasks.
-
MLE-bench - Benchmark harness for autonomous ML engineering tasks.
-
OSWorld - Open-ended benchmark environment for desktop computer-use agents.
-
OpenCUA - Open foundation stack for building and evaluating computer-use agents.
-
Stagehand - Browser automation framework for agentic web workflows and reproducible runs.
-
SWE-bench - Canonical benchmark for coding agents on real GitHub issue tasks.
-
Tau-Bench - Realistic interactive benchmark for measuring agent reliability.
-
WebArena - Real-world web task benchmark environment for browser agents.
-
WorkArena - Enterprise task benchmark for browser-based agent workflows.
-
AgentDojo - Security and robustness benchmark suite for tool-using agents.
-
AppWorld - Multi-application environment for benchmarking autonomous task completion.
-
AgentLab - Research platform for developing and evaluating web agents.
-
ALFWorld - Interactive long-horizon benchmark environment for embodied planning agents.
-
HELM - Standardized evaluation framework for model and agent behavior comparison.
-
GAIA Benchmark - Realistic benchmark for tool-using, multi-step general assistant tasks.
-
Agent Harnessing Playbook (this repo) - Practical framework for benchmark design, regression gates, and release readiness.
Deep-dive reading map organized by the major categories in this repository.
- ArXiv Deep Research Map (this repo) - Curated paper paths with per-category must-reads, a recent watchlist, and a monthly refresh workflow across frameworks, coding, MCP/tool use, eval reliability, memory, security, multimodal, quant, and on-chain/DeFi-adjacent research.
Systems-level context design: memory, retrieval, compression, routing, and long-horizon state management. Focus here on what information the model gets, when, and in what form.
- 12-Factor Agents - Engineering principles for building reliable, production-grade LLM agents.
- Anthropic: Building Effective Agents - Practical engineering patterns for agent design and execution loops.
- Anthropic: Contextual Retrieval - Retrieval architecture guidance for improving grounding and precision.
- Anthropic: Effective Context Engineering for AI Agents - Production guidance for context composition and lifecycle management.
- Anthropic: Effective Harnesses for Long-Running Agents - Patterns for long-horizon orchestration and reliability.
- LangChain: Context Engineering for Agents - Practical taxonomy for writing, selecting, compressing, and isolating context.
- Manus: Context Engineering for AI Agents - Practitioner lessons from building production autonomous workflows.
- OpenAI Evals Guide - Official framework for building eval loops and quality gates.
- OpenAI Cookbook: Getting Started with Evals - Practical eval setup walkthrough.
- RAG (Lewis et al., 2020) - Foundational retrieval-augmented generation paper.
- Chain-of-Thought Prompting (Wei et al., 2022) - Foundational reasoning/prompting technique paper.
- Lost in the Middle (Liu et al., 2023) - Key long-context failure analysis paper.
- Context Engineering Playbook (this repo) - Practical context budget, memory, retrieval, and anti-drift checklist.
- Agent Operator Trend Signals (this repo) - Synthesized practitioner themes for harness and context strategy.
Neural memory, retrieval, and graph-linking foundations relevant to advanced agent cognition.
- Neural Turing Machines (2014) - Foundational differentiable external-memory architecture.
- End-to-End Memory Networks (2015) - Multi-hop memory lookup architecture for iterative reasoning.
- Differentiable Neural Computer (2016) - Enhanced neural memory addressing for long-horizon reasoning.
- Transformer-XL (2019) - Segment-level recurrence for long-context memory reuse.
- Compressive Transformer (2019) - Compressed memory tiers for scalable sequence retention.
- RAG (Lewis et al., 2020) - Canonical retrieval-augmented generation architecture.
- kNN Language Models (2020) - Non-parametric memory retrieval at inference time.
- RETRO (2021) - Retrieval-heavy architecture for efficient knowledge access.
- Neural Bellman-Ford Networks (2021) - Graph neural reasoning for multi-hop relational inference.
- DeepProbLog - Neural-symbolic framework combining perception models and logic rules.
- Neural Linking and Memory Playbook (this repo) - Practical guide for agent memory architectures and neural-symbolic linking patterns.
Obsidian-specific architecture patterns and APIs for using vaults as agent memory backends.
- How Obsidian Stores Data - Canonical vault-on-disk model and config layout.
- Obsidian Properties - Structured metadata schema for machine-readable note attributes.
- Obsidian Plugin Guide - Official plugin architecture and lifecycle entrypoint.
- Obsidian TypeScript API (Vault) - Programmatic CRUD layer for vault files.
- obsidian-api - Official API type definitions for plugin development.
- Dataview - Query engine for structured note metadata and graph-aware retrieval.
- Juggl - Advanced graph exploration plugin for complex link topology workflows.
- Local REST API Plugin - Local HTTP interface for external agent integrations.
- Advanced URI - URI-based automation hooks for cross-tool workflows.
- Obsidian Git - Versioned vault operations for auditable agent writes.
- Obsidian Vault Architecture Playbook (this repo) - Reference architecture and operational patterns for agent-connected Obsidian systems.
Safety, red-teaming, and robustness tools for hardening agent behavior.
- garak - LLM vulnerability scanning and red-teaming toolkit for security testing.
- Guardrails AI - Validation and safety guardrails framework for LLM outputs.
- Invariant - Guardrails framework for secure and robust agent development.
- JailbreakBench - Open robustness benchmark for measuring jailbreak resistance in language models and agents.
- llm-attacks - Reference implementation and resources for adversarial jailbreak attack evaluation.
- MCP Security Best Practices - Official security guidance for MCP authorization flows, threats, and mitigations.
- NeMo Guardrails - Toolkit for adding programmable safety and policy guardrails to LLM systems.
- Promptfoo - Red-teaming and robustness testing toolkit for LLM systems.
- PyRIT - Python Risk Identification Tool for proactively testing generative AI security risks.
Configuration files and workflow examples for AI coding tools.
- awesome-cursorrules - Curated list of Cursor rule files.
- Claude Code Memory Files - Guide to CLAUDE.md and project memory.
- Claude Code Starter Configs - Ready-to-use CLAUDE.md, rules, hooks, and skills for Claude Code projects.
- Codex CLI Starter Configs - Ready-to-use AGENTS.md and config for OpenAI Codex CLI projects.
- Cursor Starter Configs - Ready-to-use .cursorrules and rule files for Cursor projects.
- CursorDirectory - Community-shared Cursor rules and configurations.
- dotfiles - Guide to managing dotfiles including agent configurations.
- Trail of Bits Claude Code Config - Opinionated Claude Code defaults and workflows from a security-focused engineering team.
Hands-on resources for designing, testing, and shipping high-quality agent skills.
- Anthropic: The Complete Guide to Building Skills for Claude (PDF) - Canonical end-to-end guide covering structure, triggering, testing, and distribution.
- anthropics/skills - Official production-ready skill examples and reference implementations.
- Claude Skill Engineering Playbook (this repo) - Distilled patterns, anti-patterns, templates, and troubleshooting from the Anthropic guide.
- Claude Skills Quickstart Checklist (this repo) - Build-test-ship checklist for repeatable skill quality.
Agent memory architectures, knowledge graphs, and second-brain integrations.
- Cognee - Memory management layer for LLM apps using knowledge graphs.
- FalkorDB - Ultra-fast graph database for AI agent knowledge.
- Graphiti - Real-time knowledge graph framework for AI agents.
- GraphRAG - Graph-based retrieval augmented generation from Microsoft.
- Khoj - Personal AI assistant with long-term memory and knowledge search.
- LangMem - Memory management toolkit for building long-horizon agent systems.
- LightRAG - Simple and fast RAG framework using graph structures.
- Mem0 - Memory layer for AI assistants and agents.
- Memgraph - In-memory graph database for real-time agent queries.
- Neo4j - Graph database platform widely used for agent knowledge stores.
- Obsidian - Knowledge base and note-taking app usable as agent memory backend.
- obsidian-graph-query - Query and traverse Obsidian vault graphs programmatically.
- ODIN - Knowledge graph construction tool built on Memgraph.
- Pinecone - Vector database for semantic memory and retrieval.
- Qdrant - High-performance vector search engine for agent memory.
- txtai - All-in-one embeddings database for semantic search and workflows.
- Weaviate - Vector database with built-in modules for AI workloads.
- Zep - Memory infrastructure and retrieval stack for AI assistants and agents.
Tools and SDKs for building AI agents on Solana.
- Anchor - Core Solana framework for building and integrating smart contracts and clients.
- Awesome Solana AI - Solana Foundation's curated list of AI-Solana projects.
- GOAT SDK - Open-source toolkit connecting AI agents to 200+ on-chain tools across Solana and EVM chains.
- Helius SDK - TypeScript SDK for Solana RPC, webhooks, and DAS API.
- Jito-Solana - MEV-aware Solana client infrastructure for advanced execution agents.
- Jupiter Swap API Docs - Official documentation for integrating Jupiter routing and swaps.
- LangChain Solana Agent Kit - LangChain tools for Solana agent operations.
- Light Protocol - ZK compression for scalable on-chain agent state.
- Metaplex - Solana programs for NFTs and digital assets used in agent identity.
- Pyth Crosschain - Oracle infrastructure for low-latency market data used by agent strategies.
- Solana Actions - Spec and tools for blockchain-powered actions and blinks.
- Solana Agent Kit - Toolkit for connecting AI agents to Solana protocols.
- Solana Kit - Modern Solana client SDK stack for building high-quality applications and agents.
- Solana Web3.js - JavaScript SDK for interacting with the Solana blockchain.
- Switchboard Solana SDK - Verifiable oracle and data-feed SDK for agent decision systems.
- Yellowstone gRPC - High-throughput real-time Solana data streams for low-latency agents and indexers.
- Solana Agent Architecture Playbook (this repo) - Reference architecture, security controls, and ops checklist for production Solana agents.
On-chain identity, wallets, and trust infrastructure for autonomous AI agents.
- Coinbase AgentKit - Toolkit for giving AI agents programmable wallet capabilities.
- Crossmint - Wallet-as-a-service for agent-owned wallets and NFT minting.
- EIP-1271 - Standard for contract wallet signature validation in dapps and agent auth flows.
- EIP-4337 - Account abstraction standard enabling programmable smart accounts for agents.
- EIP-4361 (SIWE) - Sign-In with Ethereum standard for wallet-based authentication.
- EIP-7702 - EOA delegation model for temporary smart-account-like behavior.
- ERC-7579 - Modular smart account standard for plugin-based permissions and execution.
- ERC-8004 - Proposed standard for cross-chain agent identity.
- Lit Protocol - Decentralized key management and programmable signing.
- Privy - Embedded wallet infrastructure for agent authentication.
- Safe - Multi-signature smart account for EVM agent treasuries.
- Sign-In With Solana - Wallet-native authentication pattern for Solana apps and agents.
- Solana Agent Identity - Agent wallet and identity features in Solana Agent Kit.
- Squads Protocol - Multisig and smart account protocol for Solana agents.
- Turnkey - Secure key infrastructure for programmatic wallet management.
- UCAN - User-controlled authorization for decentralized agent capabilities.
Payment protocols and infrastructure for autonomous agent transactions.
- Awesome x402 - Curated resources for the x402 payment protocol ecosystem.
- Coinbase Agentic Wallets - Wallet infrastructure for AI agents with programmable spending limits.
- Google A2A x402 Extension - Cryptocurrency payments for the Agent-to-Agent protocol via x402.
- lobster.cash - Agent payment solution on Solana with Visa Intelligent Commerce integration by Crossmint.
- Request Network - Crypto-native invoicing and payment request rails for agent billing workflows.
- Solana Pay - Open payments standard for Solana-based checkout and transfer flows.
- Superfluid - Streaming payment primitives for machine-to-machine and agent subscriptions.
- x402 Foundation - Open protocol foundation governing the x402 payment standard.
- x402 Protocol - Open HTTP payment protocol using the 402 status code for agent-to-service payments.
AI agents for decentralized finance operations and strategy.
- Autonolas - Framework for building autonomous agent services on-chain.
- DeFi Llama API - Open API for DeFi protocol data used by trading agents.
- Drift Protocol v2 - On-chain perpetuals protocol infrastructure for autonomous trading agents.
- ElizaOS DeFi Plugins - DeFi protocol integrations for ElizaOS agents.
- Gauntlet - Risk management and simulation platform for DeFi agents.
- Griffain - AI agent platform for Solana DeFi operations.
- Kamino KLend SDK - Lending protocol SDK for credit and yield allocation agents.
- Lulo - Yield optimization protocol with agent-friendly APIs.
- Orca Whirlpools SDK - Solana concentrated liquidity SDK for agent strategies.
- Raydium SDK - Solana AMM SDK for agent-driven liquidity provision.
- Spectral Finance - On-chain credit scoring and risk models for agent decisions.
- Virtuals Protocol - Agent tokenization and autonomous commerce protocol tracking agentic GDP.
- Yearn Vaults - Automated yield vaults usable as agent strategy backends.
Quantitative finance frameworks and AI-driven trading systems.
- AlphaAgent - LLM-powered agent for quantitative trading research.
- BitQuant - Multi-agent quantitative analysis framework.
- DriftPy - Python SDK for building Solana-based perp and risk management agents.
- FinGPT - Open-source financial LLM framework.
- FinRL - Deep reinforcement learning library for quantitative finance.
- Freqtrade - Open-source algorithmic trading bot in Python.
- Hummingbot - Open-source market making and arbitrage bot.
- Lean - Algorithmic trading engine by QuantConnect.
- NautilusTrader - High-performance algorithmic trading platform in Rust and Python.
- Phoenix v1 - On-chain central limit order book protocol for low-latency execution agents.
- Qlib - AI-oriented quantitative investment platform from Microsoft.
- TradingAgents - Multi-agent LLM framework simulating a trading firm.
- VectorBT - Fast backtesting and analysis library for trading strategies.
- Zipline - Pythonic algorithmic trading library for backtesting.
Debugging, tracing, evaluation, and testing tools for AI agents.
- AgentOps - Monitoring, cost tracking, and benchmarking for agent workflows.
- Braintrust - Evaluation and observability platform for AI products.
- DeepEval - Open-source LLM evaluation framework.
- Helicone - Open-source LLM observability and monitoring platform.
- LangFuse - Open-source LLM engineering platform for tracing and evaluation.
- LangSmith - Platform for debugging, testing, and monitoring LLM applications.
- LiteLLM - LLM gateway and proxy with logging, cost tracking, and routing controls.
- OpenAI Evals - Framework and benchmark registry for evaluating LLM systems.
- OpenLLMetry - OpenTelemetry-based observability for LLM applications.
- Opik - Open-source platform for LLM and agent tracing, evaluation, and monitoring.
- Phoenix - Open-source AI observability platform from Arize.
- Portkey - AI gateway with observability, caching, and fallback routing.
- SigNoz - OpenTelemetry-native observability platform for traces, logs, and metrics.
- TruLens - Open-source framework for evaluating and tracking LLM and agent experiments.
- Weave - Toolkit for tracking and evaluating LLM applications from W&B.
Curated papers on AI agents, multi-agent systems, and agent infrastructure.
- A Survey on Large Language Model based Autonomous Agents - Comprehensive survey of LLM-based agent architectures.
- ArXiv Deep Research Map (this repo) - Category-by-category reading map spanning frameworks, coding, MCP/tool use, memory, security, multimodal, and quant/on-chain adjacent domains.
- Awesome AI Agent Papers - Continuously updated collection of agent research papers.
- Chain-of-Thought Prompting - Foundational paper on reasoning in language models.
- Generative Agents - Simulating human behavior with LLM-driven agents in a sandbox.
- MemGPT - OS-inspired memory management for LLM context windows.
- ReAct - Synergizing reasoning and acting in language models.
- Reflexion - Language agents with verbal reinforcement learning.
- The Landscape of Emerging AI Agent Architectures - Survey of multi-agent design patterns.
- Toolformer - Language models that learn to use tools autonomously.
- Voyager - Open-ended embodied agent with LLM-powered curriculum.
Forums, Discord servers, newsletters, and social accounts.
- AI Agent Discord Servers - CrewAI community Discord.
- Anthropic Discord - Official Anthropic community.
- ElizaOS Discord - Community for ElizaOS agent builders.
- LangChain Discord - LangChain developer community.
- Latent Space Podcast - Podcast covering AI engineering and agents.
- r/artificial - Subreddit for AI discussions and news.
- r/LocalLLaMA - Community for local LLM deployment and agent experimentation.
- Solana AI Discord - Solana developer community with AI channels.
Contributions welcome. Read the contribution guidelines first.
If you find this project useful, consider supporting my open-source work.
Solana donations
BYLu8XD8hGDUtdRBWpGWu5HKoiPrWqCxYFSh4oxXuvPg
To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this work.
