Skip to content

sisegod/machina-trinity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 ███╗   ███╗ █████╗  ██████╗██╗  ██╗██╗███╗   ██╗ █████╗
 ████╗ ████║██╔══██╗██╔════╝██║  ██║██║████╗  ██║██╔══██╗
 ██╔████╔██║███████║██║     ███████║██║██╔██╗ ██║███████║
 ██║╚██╔╝██║██╔══██║██║     ██╔══██║██║██║╚██╗██║██╔══██║
 ██║ ╚═╝ ██║██║  ██║╚██████╗██║  ██║██║██║ ╚████║██║  ██║
 ╚═╝     ╚═╝╚═╝  ╚═╝ ╚═════╝╚═╝  ╚═╝╚═╝╚═╝  ╚═══╝╚═╝  ╚═╝
                    T R I N I T Y

The agent runtime that assumes the LLM will fail.

C++20 safety core · Transactional execution · Cryptographic audit · Self-evolution
9 layers of defense-in-depth · Deterministic replay · Runtime tool synthesis

C++20 Python Tests License v6.5



📑 Table of Contents

Language Versions

Machina documentation now uses locale-oriented naming based on modern BCP-47 style.

Locale File Status
English (en) README.md Source of truth
Korean (ko-KR) README.ko.md Maintained
Japanese (ja-JP) README.ja.md Maintained
Simplified Chinese (zh-Hans-CN) README.zh-CN.md Maintained
Traditional Chinese (zh-Hant-TW) README.zh-TW.md Maintained
Spanish (es) README.es.md Maintained
Portuguese Brazil (pt-BR) README.pt-BR.md Maintained
French (fr-FR) README.fr.md Maintained
German (de-DE) README.de.md Maintained
Vietnamese (vi-VN) README.vi.md Maintained
Indonesian (id-ID) README.id.md Maintained
Thai (th-TH) README.th.md Maintained
Russian (ru-RU) README.ru.md Maintained
Arabic (ar-SA) README.ar.md Maintained
Hindi (hi-IN) README.hi.md Maintained

Language strategy and expansion roadmap:

  • docs/LANGUAGE_STRATEGY_EN.md
  • docs/ROADMAP.md
  • Full equivalent docsets: docs/i18n/README.md

The Problem

Every agent framework gives an LLM a knife and hopes for the best.

LLM hallucinates rm -rf /? No rollback. Can't figure out why the agent broke at 3 AM? No audit trail. Tool spawns a subprocess that eats 32 GB of RAM? No resource limits. External API goes down? The whole system freezes.

These aren't edge cases. They're Tuesday.

Machina starts from a different premise: the LLM will make mistakes. The architecture's job is to make those mistakes cheap, traceable, and automatically recoverable — while still letting a capable model do genuinely autonomous work.


How It Works

Machina splits the world into three concerns. They never mix.

                    ┌─────────────────────────────────────────────────────────┐
                    │                    MACHINA TRINITY                      │
                    │                                                         │
                    │   ┌─────────────┐  ┌──────────────┐  ┌──────────────┐  │
                    │   │             │  │              │  │              │  │
                    │   │    BODY     │  │    DRIVER    │  │    MEMORY    │  │
                    │   │             │  │              │  │              │  │
                    │   │  Tx/Rollback│◄─┤  Heuristic   │  │  Hash-chain  │  │
                    │   │  Registry   │  │  LLM Policy  │  │  WAL/Ckpt   │  │
                    │   │  Sandbox    │  │  Circuit Brk  │  │  Replay     │  │
                    │   │  Lease      │  │  Fast Path   │  │  BM25+Vec   │  │
                    │   │             │  │              │  │              │  │
                    │   └──────┬──────┘  └──────┬───────┘  └──────┬───────┘  │
                    │          │                │                 │          │
                    └──────────┼────────────────┼─────────────────┼──────────┘
                               │                │                 │
              ┌────────────────┼────────────────┼─────────────────┼────────┐
              │                ▼                ▼                 ▼        │
              │  ┌──────────────────────────────────────────────────────┐  │
              │  │              PYTHON AGENT RUNTIME                    │  │
              │  │                                                      │  │
              │  │  Telegram ──► Pulse Loop (Intent→Execute→Continue)   │  │
              │  │  Autonomic ─► 6-Level GVU (Reflect→Test→Heal→...)   │  │
              │  │  Learning ──► ExpeL · Reflexion · Distillation       │  │
              │  │  Memory ────► Graph 2.0 · 4 Streams · Multi-hop     │  │
              │  │  MCP ───────► External tool discovery & bridging     │  │
              │  │                                                      │  │
              │  └──────────────────────────────────────────────────────┘  │
              └───────────────────────────────────────────────────────────┘

Body executes tools inside transactions. If anything fails, state rolls back. The LLM never touches raw state.

Driver decides what to execute. Heuristic selector always works. LLM policy is optional and sandboxed behind a circuit breaker. If the LLM fails 3 times, the system degrades gracefully — it doesn't crash.

Memory records everything as SHA-256 hash-chained audit entries. Every run can be replayed deterministically. You can prove what happened, when, and why.

Design invariant: The Body is always safe regardless of Driver quality. A bad LLM can pick the wrong tool, but it cannot corrupt state, bypass sandboxing, or break the audit chain.


What Makes This Different

1. Transactional Tool Execution

Every tool call is wrapped in a transaction. Success → commit. Failure → rollback. State is never half-written.

Tool runs inside Tx
        │
        ├── Success → DS deltas committed
        │
        └── Failure → DS state rolled back (as if nothing happened)

No other agent framework does this. In LangChain, AutoGPT, or CrewAI, a failed tool call can leave your system in an undefined state.

2. Nine Layers of Defense

Not one safety mechanism. Nine, stacked:

Layer 1   Tx + Rollback ─────────────── State integrity
Layer 2   Hash-chained Audit ─────────── Tamper-evident history
Layer 3   Allowlists ─────────────────── Command restriction
Layer 4   seccomp-BPF ────────────────── Kernel syscall filtering
Layer 5   Permission Leases ──────────── Single-use privileged tokens
Layer 6   Plugin Hash Pinning ────────── SHA-256 before dlopen
Layer 7   Capability Gates ───────────── Bitmask permission model
Layer 8   SSRF Defense ───────────────── DNS rebinding prevention
Layer 9   CRC32 WAL Framing ──────────── Crash integrity detection

Plus: bwrap namespace isolation, Genesis source guard, nonce replay protection, HMAC request signing, rate limiting, and input sanitization (safe_merge_patch blocks LLM injection of system keys).

3. Self-Evolution at Runtime

Machina can write, compile, and hot-load new tools while running — through the Genesis pipeline:

Write source ──► Compile .so ──► SHA-256 verify ──► dlopen into registry
     │                │                │
     ▼                ▼                ▼
Source guard      Hash pinning    Capability gate
(blocks dangerous  (constant-time   (rejects plugins
 APIs/headers)     verification)    exceeding caps)

This is opt-in (MACHINA_GENESIS_ENABLE=1), off by default in production, and gated behind three independent safety checks. The system can grow new capabilities without restarting — but it can't grow dangerous ones.

4. Deterministic Replay

Every execution can be reproduced from logs:

./build/machina_cli replay_strict path/to/run.log
# Bit-exact reproduction of selections and outputs
# Non-deterministic tools replay via logged tx_patch

When something goes wrong at 3 AM, you don't grep through unstructured logs. You replay the exact execution, step by step, with the exact same state transitions.

5. The System Keeps Getting Better

The autonomic engine runs a 6-level self-improvement cycle:

L1 Reflect (5min)  ──► Analyze recent experiences
L2 Test    (5min)  ──► Run self-tests, find gaps
L3 Heal    (30min) ──► Auto-fix what's broken
L4 Hygiene (30min) ──► Clean logs, compact memory
L5 Curiosity(30min)──► Explore capability gaps
L6 Web     (30min) ──► Search and learn new knowledge

With three guarantees:

  • Regression Gate — changes that reduce test pass count are blocked
  • Reward Tracker — rolling-window success metrics detect degradation
  • Auto-Rollback — bad changes revert automatically

The result: the system is monotonically improving. It either gets better or stays the same. It never gets worse.


Quick Start

Build (30 seconds)

git clone https://github.com/sisegod/machina-trinity.git
cd machina-trinity

./scripts/install_deps.sh
# manual alternative: see docs/DEPENDENCIES.md

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)          # Linux
# cmake --build build -j$(sysctl -n hw.ncpu)  # macOS

# Verify: 14/14 tests should pass
cd build && ctest --output-on-failure && cd ..

Run Without an LLM (zero dependencies)

# No LLM needed. Heuristic selector picks the right tool deterministically.
./build/machina_cli run examples/run_request.error_scan.json
# → Scans a CSV for "ERROR" patterns → produces structured report
# Control mode defaults to FALLBACK_ONLY when no policy is configured.

This is the fastest way to verify the system works. Transactional execution, audit logging, and replay all function without any LLM connection.

Connect an LLM (5 minutes, optional)

# Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b

# Built-in local policy driver (repo includes this file)
export MACHINA_POLICY_ALLOWED_SCRIPT_ROOT="$(pwd)/examples/policy_drivers"
export MACHINA_POLICY_CMD="python3 examples/policy_drivers/hello_policy.py"

# Optional: HTTP LLM bridge policy driver
# export MACHINA_POLICY_CMD="python3 examples/policy_drivers/llm_http_policy.py"
# export MACHINA_POLICY_LLM_URL="http://127.0.0.1:9000/machina_policy"
# export MACHINA_POLICY_LLM_AUTH="Bearer <token>"

# Run with LLM-driven tool selection (BLENDED via request JSON)
cat > /tmp/machina_blended.json << 'EOF'
{
  "goal_id": "goal.ERROR_SCAN.v1",
  "inputs": {"input_path": "examples/test.csv", "pattern": "ERROR", "max_rows": 1000000},
  "candidate_tags": ["tag.log", "tag.error", "tag.report"],
  "control_mode": "BLENDED"
}
EOF
./build/machina_cli run /tmp/machina_blended.json

Run Telegram Bot (optional, production-style launcher)

mkdir -p ~/.config/machina
cp .secrets.env.example ~/.config/machina/.secrets.env
chmod 600 ~/.config/machina/.secrets.env

# Fill TELEGRAM_BOT_TOKEN / TELEGRAM_CHAT_ID first
./scripts/doctor.sh
nohup ./scripts/run_bot_forever.sh >/tmp/machina_bot.launcher.out 2>&1 &
tail -f /tmp/machina_bot.log

Deploy to Production

export MACHINA_PROFILE=prod    # One switch: fsync, seccomp, strict timeouts
export MACHINA_API_TOKEN="your-secret"
export MACHINA_API_HMAC_SECRET="your-hmac-secret"

./build/machina_cli serve --host 127.0.0.1 --port 9090 --workers 4

# Enqueue work
curl -X POST http://localhost:9090/enqueue \
  -H "Authorization: Bearer your-secret" \
  -d @examples/run_request.error_scan.json

# Observe
curl http://localhost:9090/metrics   # Prometheus format
curl http://localhost:9090/stats     # Queue statistics

MACHINA_PROFILE=prod sets 7+ security defaults at once: fsync on, seccomp on, Genesis off, strict timeouts, HTTP default-deny, tool isolation enabled.


Runtime Modes

Mode Command Use Case
Run machina_cli run <request.json> Single request (batch/CI)
Serve machina_cli serve --workers N Production HTTP daemon with WAL + crash recovery
Autopilot machina_cli autopilot <dir> Disk queue worker pool
Chat machina_cli chat Interactive REPL with LLM intent parsing
Replay machina_cli replay_strict <log> Deterministic reproduction from logs
CTS machina_cli cts <manifest> Compliance Test Suite
Tool Exec machina_cli tool_exec <aid> Direct single-tool execution

Control Modes

Mode What Happens When to Use
FALLBACK_ONLY Heuristic picks tools. Deterministic. No LLM available
BLENDED LLM decides, heuristic catches failures. Recommended for production
POLICY_ONLY LLM picks everything. No fallback. Strong model + high trust
SHADOW_POLICY Heuristic runs, LLM output logged only. A/B testing LLM quality

Built-in Tools (40+)

C++ Core Tools (23)
Tool AID Description
Error Scan AID.ERROR_SCAN.v1 Pattern search in CSV/log files
Report Summary AID.REPORT_SUMMARY.v1 Structured report generation
Shell Exec AID.SHELL.EXEC.v1 Sandboxed command execution (allowlisted)
File Read/Write AID.FILE.READ/WRITE.v1 Path-validated file I/O
HTTP Get AID.NET.HTTP_GET.v1 HTTP requests with SSRF defense
Memory Append/Search/Query AID.MEMORY.*.v1 BM25 + embedding hybrid search
Queue Enqueue AID.QUEUE.ENQUEUE.v1 Disk queue work items
Genesis Write/Compile/Load AID.GENESIS.*.v1 Runtime tool synthesis pipeline
Embed/VectorDB AID.EMBED/VECDB.*.v1 Text embeddings + vector search
GPU Metrics/Smoke AID.GPU_*.v1 NVIDIA GPU status
Proc Metrics AID.PROC.SELF_METRICS.v1 Process resource usage
Ask Supervisor AID.ASK_SUP.v1 Human-in-the-loop checkpoint
Python Tools (19)
Tool AID Description
Code Exec AID.CODE.EXEC.v1 Sandboxed Python/Bash (6-layer auto-fix)
File Ops AID.FILE.LIST/SEARCH/DIFF/EDIT/APPEND/DELETE.v1 Full filesystem toolkit
Utility System AID.UTIL.SAVE/RUN/LIST/DELETE/UPDATE.v1 Reusable script library
Web Search AID.NET.WEB_SEARCH.v1 DuckDuckGo search
Project Create/Build AID.PROJECT.*.v1 Multi-file C++/Python projects
Package Mgmt AID.SYSTEM.PIP_*.v1 Isolated venv operations
MCP Bridge Tools optional

External tools connected through the Model Context Protocol:

Source Example Description
web_search AID.MCP.WEB_SEARCH.WEBSEARCHPRO.v1 Web search via MCP
web_reader AID.MCP.WEB_READER.WEBREADER.v1 URL content extraction
zai AID.MCP.ZAI.UI_TO_ARTIFACT.v1 Image analysis, OCR, diagrams

Configure in mcp_servers.json. Supports stdio, SSE, and streamable HTTP transports.


Python Agent Runtime

The C++ core handles safety. Python handles intelligence.

Telegram ──► telegram_bot.py                           [optional]
              ├── telegram_bot_handlers.py ─── Message routing
              ├── telegram_bot_pulse.py ────── 3-phase Pulse pipeline
              │     └── chat_driver.py ──────── Intent → Execute → Continue
              ├── machina_dispatch.py ──────── 70+ tool aliases (KR/EN)
              ├── machina_autonomic/ ───────── Self-improving engine (10 files)
              │     ├── _engine.py ──────────── 6-level GVU cycle
              │     ├── _sq.py ─────────────── Self-questioning loop
              │     └── _stimulus.py ───────── Curiosity driver
              ├── machina_learning.py ──────── ExpeL · Reflexion · Distillation
              ├── machina_graph.py ─────────── Entity/relation graph + multi-hop BFS
              ├── machina_mcp.py ───────────── MCP bridge (external tools)  [optional]
              └── machina_permissions.py ───── 3-tier permission engine

36 files, all ≤ 620 lines. Strict size limit enforced.

3-Tier Intent Resolution

User message
     │
     ▼
FastPath ──── keyword hash match? ──► Execute (no LLM call)
     │ miss
     ▼
Distillation ── cached rule ≥0.8 confidence? ──► Execute
     │ miss
     ▼
LLM ──── full intent classification ──► Execute

Common operations (shell, file, search, memory) skip the LLM entirely. The system learns which intents map to which tools and caches those rules with a 10-minute TTL.

Permission System

Mode Behavior
open All tools auto-allowed (dev)
standard Safe = allow, dangerous = ask via Telegram (default)
locked Read-only tools only
supervised All non-read tools require approval

Telegram sends inline keyboard buttons for approval (requires Telegram bot setup). Per-tool overrides via env or JSON config.


Testing

# C++ unit tests (~2s)
cd build && ctest --output-on-failure   # 14/14 expected

# Python guardrail tests
scripts/run_guardrails.sh

# Full catalog smoke/regression suite
scripts/run_test_catalog.sh

# Replay helpers
scripts/replay_latest.sh
scripts/replay_strict_latest.sh
C++ Test Suites (14)
Suite Tests What It Covers
CPQ 4 Concurrent priority queue thread safety
WAL 3 Write-ahead log + checkpoint/recovery
WAL Rotation 3 Segment rotation, retention limits
Tx 5 Transaction commit/rollback/replay
Tx Patch 2 tx_patch parser/apply contract
Memory 4 Append/query/rotation
Memory Query 3 BM25 + hybrid search
Toolhost 3 Plugin load/execute/isolation
GoalRegistry 5 Manifest parsing/validation
Input Safety 12 safe_merge_patch, capability filtering
Sandbox 4 seccomp-BPF syscall filtering
Lease 5 Permission lease lifecycle
Config 6 Profile detection/defaults
Plugin Hash 3 SHA-256 hash pinning
Python E2E Tests (34 cases, 13 groups)
Group Tests Coverage
Chat Intent 8 Greetings, emotions, casual, context
Shell Command 4 GPU, memory, disk, process
Web Search 4 Price, weather, person, EN enforcement
Code Execution 4 Fibonacci, calc, sort, tables
Memory 2 Save + recall
File Operations 2 Read + write
Config 2 Backend + model switching
URL Fetch 1 HTTP GET
Utility System 1 Util list
Chat Response 1 Natural language generation
Summary 1 Tool result summarization
Continue Loop 2 Done + action continuation
Auto-Memory 2 Personal info detection + skip

Plus 39 simulation scenarios (multi-step, adversarial LLM, crash recovery).


How It Compares

Out-of-the-box capabilities — other frameworks may achieve some of these through additional tooling or configuration.

Machina Trinity LangChain AutoGPT CrewAI
Transactional execution
Cryptographic audit trail
Deterministic replay
Kernel-level sandboxing ✅ seccomp-BPF
Permission leases
Plugin hash verification
Input sanitization
Circuit breaker
Runtime self-evolution ✅ Genesis
Process isolation ✅ fork+exec+bwrap
Prometheus /metrics
One-switch profiles ✅ dev/prod
Native C++ performance

These projects serve different goals and excel in their own domains (LangChain's ecosystem breadth, CrewAI's multi-agent orchestration, etc.). This table highlights where Machina's safety-first architecture provides capabilities that would require significant additional work to replicate elsewhere.


Project Structure

machina-trinity/
├── core/                    # C++ engine library
│   ├── include/machina/     #   26 public headers
│   ├── src/                 #   Implementation
│   └── cuda/                #   Optional CUDA kernels
│
├── runner/                  # C++ CLI + runtime modes
│   ├── cmd_run.cpp          #   Single request execution
│   ├── cmd_serve.cpp        #   HTTP daemon + WAL + workers
│   ├── cmd_chat.cpp         #   Interactive REPL (Pulse Loop)
│   └── serve_http.h         #   HTTP parsing, HMAC, rate limiting
│
├── tools/tier0/             # 23 built-in C++ tools
├── toolhost/                # Plugin host (NDJSON + fork modes)
│
├── machina_autonomic/       # Self-improving engine (10 files)
├── policies/                # Chat driver + LLM bridge (4 files)
│
├── machina_dispatch.py      # Tool dispatch facade (70+ aliases)
├── machina_learning.py      # ExpeL, Reflexion, Voyager
├── machina_graph.py         # Graph Memory 2.0
├── machina_mcp.py           # MCP bridge
├── machina_permissions.py   # 3-tier permission engine
├── telegram_bot.py          # Telegram bot interface
│
├── examples/                # Policy driver examples + quickstart
├── docs/                    # Architecture, operations, API, policy
└── scripts/                 # Build, guardrails, ops, replay helpers

Documentation

Document What You'll Find
Architecture Trinity design, execution lifecycle, security model, module map
Operations Production deployment, profiles, hardening, environment variables
Serve API HTTP endpoints, authentication, rate limiting
Policy Driver LLM integration protocol, driver authoring guide
LLM Backends Ollama, llama.cpp, Claude, OpenAI setup
Quick Start 10-minute build → configure → run guide
Dependencies Required C++/Python/system packages per OS
Language Strategy Locale policy, Telegram language status, multilingual rollout plan

LLM Support

Any OpenAI-compatible API works out of the box:

Backend Status
Ollama Full support (recommended for local dev)
llama.cpp Full support
vLLM Full support
OpenRouter Full support
Anthropic Claude Native Messages API integration

Security Notice

Machina can execute shell commands and load arbitrary plugins. Treat it as high-risk software.

  • Always run in a container or sandboxed environment in production
  • Never expose serve to the public internet without authentication
  • Keep credentials in ~/.config/machina/.secrets.env (outside repo)
  • See SECURITY.md for the security policy and vulnerability reporting

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines.

Priority areas: security hardening, new sandboxed tools, LLM policy improvements, documentation, test coverage.


License

Apache License 2.0 — see LICENSE for details.


Built for a world where LLMs are powerful but imperfect.

About

Autonomous agents built on the assumption that LLMs will fail. Machina Trinity combines a transactional C++20 core with a Python intelligence layer, layered security, cryptographic auditability, deterministic replay, and evolutionary runtime control.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages