Awesome Agent Cortex

The sovereign agent stack — practical scripts, on-chain identity, and knowledge graphs for AI agents that think, remember, and own themselves.

A curated list covering the full AI agent ecosystem: frameworks, coding agents, MCP tooling, knowledge graphs, blockchain identity, decentralized finance agents, quantitative trading, and observability. What makes this list unique is the combination of practical developer tooling with on-chain identity and memory infrastructure — resources no other awesome list brings together.

Note: Some resources are intentionally listed in multiple sections when they are core to more than one workflow domain (for example, prompt + eval, or coding + CLI usage).

Agent Frameworks
Coding Agents
Voice and Multimodal Agents
Hermes Stack
CLI and TUI Tools
Agent Runtime Infrastructure
MCP Ecosystem
Prompt Engineering
Agent Harnessing and Evaluation
ArXiv Deep Research Map
Context Engineering
Neural Networks and Neural Linking
Obsidian Vault Architecture for Agents
Agent Security and Robustness
Agent Configs and Dotfiles
Skill Engineering and Playbooks
Knowledge Graphs and Memory
Solana Agent Infrastructure
Agent Identity and Wallets
Agent Payments
DeFi Agents
Quant and Trading Agents
Agent Observability and Testing
Research Papers
Communities

Agent Frameworks

Multi-agent orchestration, single-agent SDKs, and runtime frameworks.

AG2 - Open-source AgentOS for building multi-agent systems (evolved from AutoGen).
Agno - Framework for building and running agentic software at scale.
AutoGen - Multi-agent conversation framework from Microsoft Research.
Claude Agent SDK - Official Python SDK for building agents with Claude models.
CrewAI - Role-based multi-agent orchestration framework.
ElizaOS - Multi-agent simulation framework for autonomous characters.
Google A2A - Agent-to-Agent protocol for cross-framework agent communication.
Google ADK - Agent Development Kit for building agents with Gemini.
Haystack - LLM orchestration framework for building search and RAG pipelines.
Hermes Agent - Tool-using autonomous agent platform with memory, skills, delegation, and MCP support.
Julep - Stateful agent platform with built-in persistence and task workflows.
LangChain - Composable framework for building LLM-powered applications.
LangGraph - Library for building stateful multi-agent workflows as graphs.
Letta - Stateful agents with long-term memory (formerly MemGPT).
LlamaIndex - Data framework for document agents, retrieval, and workflow orchestration.
Magentic-One - Multi-agent team for complex web and file tasks.
Mastra - TypeScript framework for building AI applications and agents.
Microsoft Agent Framework - Framework for building, orchestrating, and deploying agents with Python and .NET support.
OpenAI Agents SDK - Official SDK for building agents with OpenAI models.
OpenClaw - Self-hosted personal AI agent with multi-platform messaging and skill registry.
Phidata - Toolkit for building AI assistants with memory and tools.
PydanticAI - Type-safe agent framework built around Pydantic.
Rig - Rust framework for building LLM-powered applications.
Semantic Kernel - SDK for integrating LLMs into apps with plugin architecture.
Smolagents - Lightweight agent framework from Hugging Face.
Swarm - Educational framework for multi-agent handoffs and routines.

Coding Agents

AI agents that write, review, and debug code.

Aider - AI pair programming in the terminal with git integration.
Claude Code - Anthropic's agentic CLI for code generation and editing.
Cline - Autonomous coding agent for VS Code with tool use.
Continue - Open-source AI code assistant for VS Code and JetBrains.
Cursor - AI-first code editor built on VS Code.
Devin - Autonomous software engineering agent by Cognition.
Goose - Autonomous developer agent from Block.
OpenCodex - OpenAI's CLI coding agent.
OpenHands - Platform for AI software development agents (formerly OpenDevin).
SWE-Agent - Agent for automatically resolving GitHub issues.
Windsurf - AI-native IDE by Codeium with agentic flows.

Claude Code Resources

awesome-claude-code - Curated list of Claude Code resources.
Claude Code Hooks - Event-driven shell command automation.
Claude Code Skills - Reusable prompt-driven workflows.
CLAUDE.md Guide - Official documentation on memory files.
claude-code-tips - Community-sourced tips and tricks.
Everything Claude Code - Comprehensive Claude Code harness with agent skills, hooks, and multi-language support.

Codex Resources

Codex Docs - Official Codex documentation hub.
Codex CLI - Guide to local Codex CLI workflows.
Codex Non-Interactive Mode - Batch and CI automation with codex exec.
AGENTS.md Guide (Codex) - Instruction hierarchy and scoping patterns for Codex.
Codex Optimization Playbook (this repo) - Practical operator patterns for speed, safety, and quality.

Voice and Multimodal Agents

Agents with voice, vision, and multimodal capabilities.

ElevenLabs - Text-to-speech and voice cloning API for agent voice interfaces.
LiveKit Agents - Framework for building real-time multimodal AI agents.
Pipecat - Framework for building voice and multimodal conversational agents.
TEN Framework - Open-source framework for conversational voice AI agents.
Ultravox - Fast multimodal LLM for real-time voice AI.
Vapi - Platform for building and deploying voice AI agents.
Vocode Core - Modular open-source framework for building voice-based LLM agents.
Whisper - Open-source speech recognition model from OpenAI.

Hermes Stack

Hermes Agent runtime, deployment rails, and operator resources.

Hermes Agent - Open-source autonomous AI agent with CLI, gateway, memory, subagents, and broad tool integrations.
Hermes Hub (this repo) - Local operator knowledge base for Hermes setup, configuration, memory/skills workflows, and contribution orientation.
Hermes Agent + hermes-fly Best Practices (this repo) - Practical setup, operations, security, and optimization playbook.
Hermes Agent Optimization Playbook (this repo) - Deep operator guide for context, delegation, memory, and execution tuning.
Hermes Agent Self-Evolution - Evolutionary self-improvement framework for optimizing Hermes Agent prompts, skills, and code.
Hermes Paperclip Adapter - Adapter for running Hermes Agent as a managed employee inside Paperclip.
hermes-fly - Fly.io deployment and operations CLI for Hermes Agent with deploy, logs, doctor, and teardown workflows.
Hermes Stack Maturity Ladder (this repo) - L1-L3 readiness model with upgrade paths and operational checklist.
Hermes Stack Quickstart Recipes (this repo) - Copy/paste recipes for local dev, hosted production, secure mode, and CI operations.

CLI and TUI Tools

Terminal-based agent interfaces and developer tools.

Claude Code - Agentic CLI that operates directly in the terminal.
Gemini CLI - Google's command-line interface for Gemini models.
Glow - Terminal Markdown renderer useful for agent output.
Hermes Agent - CLI and gateway agent runtime with tools, memory, delegation, and automation support.
hermes-fly - CLI wizard to deploy and operate Hermes Agent on Fly.io.
lazygit - Terminal UI for git commonly paired with coding agents.
llm - CLI tool for interacting with LLMs from the terminal.
aichat - All-in-one LLM CLI with chat, shell assistant, RAG, and agent features.
OpenCodex - Lightweight CLI coding agent from OpenAI.
sgpt - Command-line productivity tool powered by LLMs.
tmux - Terminal multiplexer for running agents in persistent sessions.
Warp - Modern terminal with built-in AI assistance.
Zellij - Terminal workspace with plugin system for agent integration.

Agent Runtime Infrastructure

Execution sandboxes and runtime platforms for safely running agent actions and generated code.

CUA - Open-source infrastructure for computer-use agents with sandboxes, SDKs, and benchmarks.
Daytona - Secure and elastic runtime infrastructure for AI-generated code execution.
E2B - Open-source secure cloud sandbox environment for AI agents.
Firecracker - Secure and fast microVM technology for isolated agent execution.
gVisor - Application kernel for containers that adds a strong isolation boundary.
Kata Containers - Lightweight VM-based container runtime for stronger workload isolation.
Modal - Serverless compute platform often used for running agent workloads and tools.
RunPod Python SDK - Python SDK for RunPod serverless and worker-based AI workloads.

MCP Ecosystem

Model Context Protocol servers, clients, and tooling.

Awesome MCP Servers - Curated list of MCP server implementations.
Chrome DevTools MCP - Official Chrome DevTools MCP server for coding and browser automation agents.
FastMCP - Pythonic framework for building MCP servers and clients quickly.
GitHub MCP Server - Official MCP server for GitHub workflows and repository actions.
mcp - Official reference MCP server implementations.
MCP Agent - Framework patterns for building agents on top of MCP.
MCP for Beginners - Cross-language curriculum and practical examples for learning MCP.
MCP Go SDK - Go implementation of the Model Context Protocol.
MCP Inspector - Official inspector and debugging tool for MCP servers.
MCP Python SDK - Official Python SDK for building MCP servers.
MCP Registry - Community registry service for discovering MCP servers.
MCP Rust SDK - Official Rust SDK for building MCP servers.
MCP Spec - Official Model Context Protocol specification.
MCP Specification Repo - Canonical specification and documentation repository.
MCP TypeScript SDK - Official TypeScript SDK for building MCP servers.
Playwright MCP - MCP server for browser automation via Playwright.
Smithery - Registry and hosting platform for MCP servers.

Prompt Engineering

Instruction-writing craft: system prompts, response framing, and reusable prompt templates. Focus here on what to ask and how to phrase it at the prompt layer.

Anthropic Prompt Library - Official prompt examples from Anthropic.
awesome-chatgpt-prompts - Collection of prompt examples for ChatGPT.
Claude System Prompts - Guide to writing effective system prompts.
OpenAI Prompt Engineering Guide - Official guide to designing reliable prompts and instruction patterns.
DSPy - Framework for programming with foundation models instead of prompting.
fabric - Framework for augmenting humans using AI with curated prompts.
LangChain Hub - Community-driven prompt and chain sharing platform.
Promptfoo - Testing and evaluation framework for LLM prompts.
System Prompts - Collection of system prompts for various AI models.

Agent Harnessing and Evaluation

Harnesses, benchmarks, and evaluation frameworks for measuring agent quality and reliability.

Benchmark Reality Check (real-world tool use)

MCPMark (paper) - 127-task MCP benchmark; reports best pass@1 at 52.56% (gpt-5-medium), with several strong models below 30% pass@1.
MCPMark (leaderboard) - Live model comparisons for realistic MCP task execution.
τ-bench - Tool-agent-user benchmark; reports strong function-calling agents still below 50% task success in its setup.
OSWorld - Open-ended computer-use benchmark; reports best model 12.24% vs 72.36% human success in initial results.
WebArena - Realistic web-task benchmark; reports best GPT-4-based agent at 14.41% vs 78.24% human.
GAIA - General assistant benchmark; original framing reports large human-model gap on tool-heavy questions.
AgentBench - Multi-domain benchmark suite for evaluating LLMs as agents.
AgentEvals - Evaluation utilities for scoring agent trajectories and outcomes.
AutoGen agbench - Benchmark runner for AutoGen agent workflows.
BrowserGym - Gym-style environment for training and evaluating browser agents.
browser-use - Framework for browser task automation and agent web interaction loops.
Inspect AI - Open-source framework for reproducible LLM and agent evaluations.
JailbreakBench - Open robustness benchmark for measuring jailbreak resistance in language models and agents.
MCPMark - Stress-testing benchmark for evaluating model and agent capability on MCP tasks.
MLE-bench - Benchmark harness for autonomous ML engineering tasks.
OSWorld - Open-ended benchmark environment for desktop computer-use agents.
OpenCUA - Open foundation stack for building and evaluating computer-use agents.
Stagehand - Browser automation framework for agentic web workflows and reproducible runs.
SWE-bench - Canonical benchmark for coding agents on real GitHub issue tasks.
Tau-Bench - Realistic interactive benchmark for measuring agent reliability.
WebArena - Real-world web task benchmark environment for browser agents.
WorkArena - Enterprise task benchmark for browser-based agent workflows.
AgentDojo - Security and robustness benchmark suite for tool-using agents.
AppWorld - Multi-application environment for benchmarking autonomous task completion.
AgentLab - Research platform for developing and evaluating web agents.
ALFWorld - Interactive long-horizon benchmark environment for embodied planning agents.
HELM - Standardized evaluation framework for model and agent behavior comparison.
GAIA Benchmark - Realistic benchmark for tool-using, multi-step general assistant tasks.
Agent Harnessing Playbook (this repo) - Practical framework for benchmark design, regression gates, and release readiness.

ArXiv Deep Research Map

Deep-dive reading map organized by the major categories in this repository.

ArXiv Deep Research Map (this repo) - Curated paper paths with per-category must-reads, a recent watchlist, and a monthly refresh workflow across frameworks, coding, MCP/tool use, eval reliability, memory, security, multimodal, quant, and on-chain/DeFi-adjacent research.

Context Engineering

Systems-level context design: memory, retrieval, compression, routing, and long-horizon state management. Focus here on what information the model gets, when, and in what form.

12-Factor Agents - Engineering principles for building reliable, production-grade LLM agents.
Anthropic: Building Effective Agents - Practical engineering patterns for agent design and execution loops.
Anthropic: Contextual Retrieval - Retrieval architecture guidance for improving grounding and precision.
Anthropic: Effective Context Engineering for AI Agents - Production guidance for context composition and lifecycle management.
Anthropic: Effective Harnesses for Long-Running Agents - Patterns for long-horizon orchestration and reliability.
LangChain: Context Engineering for Agents - Practical taxonomy for writing, selecting, compressing, and isolating context.
Manus: Context Engineering for AI Agents - Practitioner lessons from building production autonomous workflows.
OpenAI Evals Guide - Official framework for building eval loops and quality gates.
OpenAI Cookbook: Getting Started with Evals - Practical eval setup walkthrough.
RAG (Lewis et al., 2020) - Foundational retrieval-augmented generation paper.
Chain-of-Thought Prompting (Wei et al., 2022) - Foundational reasoning/prompting technique paper.
Lost in the Middle (Liu et al., 2023) - Key long-context failure analysis paper.
Context Engineering Playbook (this repo) - Practical context budget, memory, retrieval, and anti-drift checklist.
Agent Operator Trend Signals (this repo) - Synthesized practitioner themes for harness and context strategy.

Neural Networks and Neural Linking

Neural memory, retrieval, and graph-linking foundations relevant to advanced agent cognition.

Neural Turing Machines (2014) - Foundational differentiable external-memory architecture.
End-to-End Memory Networks (2015) - Multi-hop memory lookup architecture for iterative reasoning.
Differentiable Neural Computer (2016) - Enhanced neural memory addressing for long-horizon reasoning.
Transformer-XL (2019) - Segment-level recurrence for long-context memory reuse.
Compressive Transformer (2019) - Compressed memory tiers for scalable sequence retention.
RAG (Lewis et al., 2020) - Canonical retrieval-augmented generation architecture.
kNN Language Models (2020) - Non-parametric memory retrieval at inference time.
RETRO (2021) - Retrieval-heavy architecture for efficient knowledge access.
Neural Bellman-Ford Networks (2021) - Graph neural reasoning for multi-hop relational inference.
DeepProbLog - Neural-symbolic framework combining perception models and logic rules.
Neural Linking and Memory Playbook (this repo) - Practical guide for agent memory architectures and neural-symbolic linking patterns.

Obsidian Vault Architecture for Agents

Obsidian-specific architecture patterns and APIs for using vaults as agent memory backends.

How Obsidian Stores Data - Canonical vault-on-disk model and config layout.
Obsidian Properties - Structured metadata schema for machine-readable note attributes.
Obsidian Plugin Guide - Official plugin architecture and lifecycle entrypoint.
Obsidian TypeScript API (Vault) - Programmatic CRUD layer for vault files.
obsidian-api - Official API type definitions for plugin development.
Dataview - Query engine for structured note metadata and graph-aware retrieval.
Juggl - Advanced graph exploration plugin for complex link topology workflows.
Local REST API Plugin - Local HTTP interface for external agent integrations.
Advanced URI - URI-based automation hooks for cross-tool workflows.
Obsidian Git - Versioned vault operations for auditable agent writes.
Obsidian Vault Architecture Playbook (this repo) - Reference architecture and operational patterns for agent-connected Obsidian systems.

Agent Security and Robustness

Safety, red-teaming, and robustness tools for hardening agent behavior.

garak - LLM vulnerability scanning and red-teaming toolkit for security testing.
Guardrails AI - Validation and safety guardrails framework for LLM outputs.
Invariant - Guardrails framework for secure and robust agent development.
JailbreakBench - Open robustness benchmark for measuring jailbreak resistance in language models and agents.
llm-attacks - Reference implementation and resources for adversarial jailbreak attack evaluation.
MCP Security Best Practices - Official security guidance for MCP authorization flows, threats, and mitigations.
NeMo Guardrails - Toolkit for adding programmable safety and policy guardrails to LLM systems.
Promptfoo - Red-teaming and robustness testing toolkit for LLM systems.
PyRIT - Python Risk Identification Tool for proactively testing generative AI security risks.

Agent Configs and Dotfiles

Configuration files and workflow examples for AI coding tools.

awesome-cursorrules - Curated list of Cursor rule files.
Claude Code Memory Files - Guide to CLAUDE.md and project memory.
Claude Code Starter Configs - Ready-to-use CLAUDE.md, rules, hooks, and skills for Claude Code projects.
Codex CLI Starter Configs - Ready-to-use AGENTS.md and config for OpenAI Codex CLI projects.
Cursor Starter Configs - Ready-to-use .cursorrules and rule files for Cursor projects.
CursorDirectory - Community-shared Cursor rules and configurations.
dotfiles - Guide to managing dotfiles including agent configurations.
Trail of Bits Claude Code Config - Opinionated Claude Code defaults and workflows from a security-focused engineering team.

Skill Engineering and Playbooks

Hands-on resources for designing, testing, and shipping high-quality agent skills.

Anthropic: The Complete Guide to Building Skills for Claude (PDF) - Canonical end-to-end guide covering structure, triggering, testing, and distribution.
anthropics/skills - Official production-ready skill examples and reference implementations.
Claude Skill Engineering Playbook (this repo) - Distilled patterns, anti-patterns, templates, and troubleshooting from the Anthropic guide.
Claude Skills Quickstart Checklist (this repo) - Build-test-ship checklist for repeatable skill quality.

Knowledge Graphs and Memory

Agent memory architectures, knowledge graphs, and second-brain integrations.

Cognee - Memory management layer for LLM apps using knowledge graphs.
FalkorDB - Ultra-fast graph database for AI agent knowledge.
Graphiti - Real-time knowledge graph framework for AI agents.
GraphRAG - Graph-based retrieval augmented generation from Microsoft.
Khoj - Personal AI assistant with long-term memory and knowledge search.
LangMem - Memory management toolkit for building long-horizon agent systems.
LightRAG - Simple and fast RAG framework using graph structures.
Mem0 - Memory layer for AI assistants and agents.
Memgraph - In-memory graph database for real-time agent queries.
Neo4j - Graph database platform widely used for agent knowledge stores.
Obsidian - Knowledge base and note-taking app usable as agent memory backend.
obsidian-graph-query - Query and traverse Obsidian vault graphs programmatically.
ODIN - Knowledge graph construction tool built on Memgraph.
Pinecone - Vector database for semantic memory and retrieval.
Qdrant - High-performance vector search engine for agent memory.
txtai - All-in-one embeddings database for semantic search and workflows.
Weaviate - Vector database with built-in modules for AI workloads.
Zep - Memory infrastructure and retrieval stack for AI assistants and agents.

Solana Agent Infrastructure

Tools and SDKs for building AI agents on Solana.

Anchor - Core Solana framework for building and integrating smart contracts and clients.
Awesome Solana AI - Solana Foundation's curated list of AI-Solana projects.
GOAT SDK - Open-source toolkit connecting AI agents to 200+ on-chain tools across Solana and EVM chains.
Helius SDK - TypeScript SDK for Solana RPC, webhooks, and DAS API.
Jito-Solana - MEV-aware Solana client infrastructure for advanced execution agents.
Jupiter Swap API Docs - Official documentation for integrating Jupiter routing and swaps.
LangChain Solana Agent Kit - LangChain tools for Solana agent operations.
Light Protocol - ZK compression for scalable on-chain agent state.
Metaplex - Solana programs for NFTs and digital assets used in agent identity.
Pyth Crosschain - Oracle infrastructure for low-latency market data used by agent strategies.
Solana Actions - Spec and tools for blockchain-powered actions and blinks.
Solana Agent Kit - Toolkit for connecting AI agents to Solana protocols.
Solana Kit - Modern Solana client SDK stack for building high-quality applications and agents.
Solana Web3.js - JavaScript SDK for interacting with the Solana blockchain.
Switchboard Solana SDK - Verifiable oracle and data-feed SDK for agent decision systems.
Yellowstone gRPC - High-throughput real-time Solana data streams for low-latency agents and indexers.
Solana Agent Architecture Playbook (this repo) - Reference architecture, security controls, and ops checklist for production Solana agents.

Agent Identity and Wallets

On-chain identity, wallets, and trust infrastructure for autonomous AI agents.

Coinbase AgentKit - Toolkit for giving AI agents programmable wallet capabilities.
Crossmint - Wallet-as-a-service for agent-owned wallets and NFT minting.
EIP-1271 - Standard for contract wallet signature validation in dapps and agent auth flows.
EIP-4337 - Account abstraction standard enabling programmable smart accounts for agents.
EIP-4361 (SIWE) - Sign-In with Ethereum standard for wallet-based authentication.
EIP-7702 - EOA delegation model for temporary smart-account-like behavior.
ERC-7579 - Modular smart account standard for plugin-based permissions and execution.
ERC-8004 - Proposed standard for cross-chain agent identity.
Lit Protocol - Decentralized key management and programmable signing.
Privy - Embedded wallet infrastructure for agent authentication.
Safe - Multi-signature smart account for EVM agent treasuries.
Sign-In With Solana - Wallet-native authentication pattern for Solana apps and agents.
Solana Agent Identity - Agent wallet and identity features in Solana Agent Kit.
Squads Protocol - Multisig and smart account protocol for Solana agents.
Turnkey - Secure key infrastructure for programmatic wallet management.
UCAN - User-controlled authorization for decentralized agent capabilities.

Agent Payments

Payment protocols and infrastructure for autonomous agent transactions.

Awesome x402 - Curated resources for the x402 payment protocol ecosystem.
Coinbase Agentic Wallets - Wallet infrastructure for AI agents with programmable spending limits.
Google A2A x402 Extension - Cryptocurrency payments for the Agent-to-Agent protocol via x402.
lobster.cash - Agent payment solution on Solana with Visa Intelligent Commerce integration by Crossmint.
Request Network - Crypto-native invoicing and payment request rails for agent billing workflows.
Solana Pay - Open payments standard for Solana-based checkout and transfer flows.
Superfluid - Streaming payment primitives for machine-to-machine and agent subscriptions.
x402 Foundation - Open protocol foundation governing the x402 payment standard.
x402 Protocol - Open HTTP payment protocol using the 402 status code for agent-to-service payments.

DeFi Agents

AI agents for decentralized finance operations and strategy.

Autonolas - Framework for building autonomous agent services on-chain.
DeFi Llama API - Open API for DeFi protocol data used by trading agents.
Drift Protocol v2 - On-chain perpetuals protocol infrastructure for autonomous trading agents.
ElizaOS DeFi Plugins - DeFi protocol integrations for ElizaOS agents.
Gauntlet - Risk management and simulation platform for DeFi agents.
Griffain - AI agent platform for Solana DeFi operations.
Kamino KLend SDK - Lending protocol SDK for credit and yield allocation agents.
Lulo - Yield optimization protocol with agent-friendly APIs.
Orca Whirlpools SDK - Solana concentrated liquidity SDK for agent strategies.
Raydium SDK - Solana AMM SDK for agent-driven liquidity provision.
Spectral Finance - On-chain credit scoring and risk models for agent decisions.
Virtuals Protocol - Agent tokenization and autonomous commerce protocol tracking agentic GDP.
Yearn Vaults - Automated yield vaults usable as agent strategy backends.

Quant and Trading Agents

Quantitative finance frameworks and AI-driven trading systems.

AlphaAgent - LLM-powered agent for quantitative trading research.
BitQuant - Multi-agent quantitative analysis framework.
DriftPy - Python SDK for building Solana-based perp and risk management agents.
FinGPT - Open-source financial LLM framework.
FinRL - Deep reinforcement learning library for quantitative finance.
Freqtrade - Open-source algorithmic trading bot in Python.
Hummingbot - Open-source market making and arbitrage bot.
Lean - Algorithmic trading engine by QuantConnect.
NautilusTrader - High-performance algorithmic trading platform in Rust and Python.
Phoenix v1 - On-chain central limit order book protocol for low-latency execution agents.
Qlib - AI-oriented quantitative investment platform from Microsoft.
TradingAgents - Multi-agent LLM framework simulating a trading firm.
VectorBT - Fast backtesting and analysis library for trading strategies.
Zipline - Pythonic algorithmic trading library for backtesting.

Agent Observability and Testing

Debugging, tracing, evaluation, and testing tools for AI agents.

AgentOps - Monitoring, cost tracking, and benchmarking for agent workflows.
Braintrust - Evaluation and observability platform for AI products.
DeepEval - Open-source LLM evaluation framework.
Helicone - Open-source LLM observability and monitoring platform.
LangFuse - Open-source LLM engineering platform for tracing and evaluation.
LangSmith - Platform for debugging, testing, and monitoring LLM applications.
LiteLLM - LLM gateway and proxy with logging, cost tracking, and routing controls.
OpenAI Evals - Framework and benchmark registry for evaluating LLM systems.
OpenLLMetry - OpenTelemetry-based observability for LLM applications.
Opik - Open-source platform for LLM and agent tracing, evaluation, and monitoring.
Phoenix - Open-source AI observability platform from Arize.
Portkey - AI gateway with observability, caching, and fallback routing.
SigNoz - OpenTelemetry-native observability platform for traces, logs, and metrics.
TruLens - Open-source framework for evaluating and tracking LLM and agent experiments.
Weave - Toolkit for tracking and evaluating LLM applications from W&B.

Research Papers

Curated papers on AI agents, multi-agent systems, and agent infrastructure.

A Survey on Large Language Model based Autonomous Agents - Comprehensive survey of LLM-based agent architectures.
ArXiv Deep Research Map (this repo) - Category-by-category reading map spanning frameworks, coding, MCP/tool use, memory, security, multimodal, and quant/on-chain adjacent domains.
Awesome AI Agent Papers - Continuously updated collection of agent research papers.
Chain-of-Thought Prompting - Foundational paper on reasoning in language models.
Generative Agents - Simulating human behavior with LLM-driven agents in a sandbox.
MemGPT - OS-inspired memory management for LLM context windows.
ReAct - Synergizing reasoning and acting in language models.
Reflexion - Language agents with verbal reinforcement learning.
The Landscape of Emerging AI Agent Architectures - Survey of multi-agent design patterns.
Toolformer - Language models that learn to use tools autonomously.
Voyager - Open-ended embodied agent with LLM-powered curriculum.

Communities

Forums, Discord servers, newsletters, and social accounts.

AI Agent Discord Servers - CrewAI community Discord.
Anthropic Discord - Official Anthropic community.
ElizaOS Discord - Community for ElizaOS agent builders.
LangChain Discord - LangChain developer community.
Latent Space Podcast - Podcast covering AI engineering and agents.
r/artificial - Subreddit for AI discussions and news.
r/LocalLLaMA - Community for local LLM deployment and agent experimentation.
Solana AI Discord - Solana developer community with AI channels.

Contributing

Contributions welcome. Read the contribution guidelines first.

❤️ Support the Project

If you find this project useful, consider supporting my open-source work.

Solana donations

BYLu8XD8hGDUtdRBWpGWu5HKoiPrWqCxYFSh4oxXuvPg

License

To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this work.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
claude		claude
codex		codex
cursorrules		cursorrules
guides		guides
hermes		hermes
.editorconfig		.editorconfig
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Awesome Agent Cortex

Contents

Agent Frameworks

Coding Agents

Claude Code Resources

Codex Resources

Voice and Multimodal Agents

Hermes Stack

CLI and TUI Tools

Agent Runtime Infrastructure

MCP Ecosystem

Prompt Engineering

Agent Harnessing and Evaluation

Benchmark Reality Check (real-world tool use)

ArXiv Deep Research Map

Context Engineering

Neural Networks and Neural Linking

Obsidian Vault Architecture for Agents

Agent Security and Robustness

Agent Configs and Dotfiles

Skill Engineering and Playbooks

Knowledge Graphs and Memory

Solana Agent Infrastructure

Agent Identity and Wallets

Agent Payments

DeFi Agents

Quant and Trading Agents

Agent Observability and Testing

Research Papers

Communities

Contributing

❤️ Support the Project

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages