Skip to content

Guardloop.dev is a self-improving AI governance system that learns from LLM mistakes, generates dynamic guardrails, and prevents repeated failures. It automatically enforces coding standards, security requirements, and compliance rules across Claude, Gemini, Codex, and other AI tools.

License

Notifications You must be signed in to change notification settings

lineCode/guardloop.dev

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

59 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ GuardLoop v2.2 [Experimental]

Exploring Self-Learning AI Governance

Status Python License Contributors Welcome

An experimental system that learns from LLM failures and generates adaptive guardloops.

⚠️ This is a research project and proof-of-concept. Core ideas are validated, production deployment requires hardening.


πŸ’‘ The Core Idea

Problem: LLMs make mistakes. Static rules can't catch evolving failure patterns.

Hypothesis: What if AI governance could learn from failures and adapt automatically?

GuardLoop's Approach:

  1. πŸ“ Capture every AI interaction and outcome
  2. πŸ” Analyze patterns in failures (missing tests, security issues, etc.)
  3. 🧠 Learn and generate dynamic guardloops
  4. πŸ›‘οΈ Prevent repeated mistakes automatically

βœ… What Works Today

Core Features (Validated & Working):

  • βœ… AI interaction logging and pattern detection
  • βœ… Dynamic guardloop generation from failures
  • βœ… Task classification (skip guardloops for creative work)
  • βœ… Basic enforcement with Claude CLI
  • βœ… Pre-warm cache for instant guardloop loading (99.9% faster first request)
  • βœ… File safety validation and auto-save
  • βœ… Conversation history across sessions

Tested & Reliable:

  • βœ… Claude CLI integration (primary adapter)
  • βœ… SQLite failure logging with analytics
  • βœ… 3 core agents (architect, coder, tester)
  • βœ… Context optimization (pre-warm cache: 0.22ms vs 300ms cold start)

🚧 What's Theoretical/In Progress

Features Under Development:

  • 🚧 Full 13-agent orchestration (10 agents are basic stubs)
  • 🚧 Multi-tool support (Gemini/Codex adapters incomplete)
  • 🚧 Semantic guardloop matching (embeddings not yet implemented)
  • 🚧 Advanced compliance validation (GDPR/ISO rules exist but not legally validated)
  • 🚧 Performance metrics (some claims are projections, not benchmarked)

Known Limitations:

  • ⚠️ Only Claude adapter is fully functional
  • ⚠️ Agent chain optimization is hardcoded, not dynamic yet
  • ⚠️ Large contexts (>10K tokens) may timeout
  • ⚠️ File auto-save has edge cases with binary/system files

See CRITICAL.md for complete limitations list.


πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Claude CLI installed (pip install claude-cli)
  • ⚠️ Note: Only Claude is fully supported. Gemini/Codex coming soon.

Installation

# Clone the repository
git clone https://github.com/samibs/guardloop.dev.git
cd guardloop.dev

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .

# Initialize guardloop
guardloop init

# Verify installation
guardloop --version

First Run

# Test with a simple command
guardloop run claude "create a hello world function"

# Expected: Should work with basic guardloops
# If it fails: Check logs at ~/.guardloop/logs/

⚠️ Troubleshooting: See CRITICAL.md for common issues and workarounds.


πŸ’‘ Core Concepts Demonstrated

1. Pattern Detection (Working)

After multiple failures with similar issues, GuardLoop learns:

# After 5 sessions where Claude forgot error handling
$ guardloop analyze --days 7

πŸ“Š Pattern Detected:
   - Missing try-catch blocks in async functions
   - Occurrences: 5
   - Confidence: 0.85

🧠 Generated Guardrail:
   "Always wrap async database calls in try-catch blocks"
   Status: trial β†’ validated β†’ enforced

2. Task Classification (Working)

Intelligently skips guardloops for non-code tasks:

# Code task - guardloops applied βœ…
>>> implement user authentication
πŸ“‹ Classified: code (confidence: 0.95)
πŸ›‘οΈ Guardrails: Applied

# Creative task - guardloops skipped ⏭️
>>> write a product launch blog post
πŸ“‹ Classified: creative (confidence: 0.92)
πŸ›‘οΈ Guardrails: Skipped (not needed)

3. Pre-Warm Cache (Working)

Instant guardloop loading eliminates cold-start latency:

# Performance Results:
- Pre-warm time: 1.74ms (initialization overhead)
- First request: 0.22ms (cached) vs ~300ms (cold)
- Improvement: 99.9% faster

4. File Safety (Working)

Validates and auto-saves LLM-generated files:

>>> create auth service

πŸ’Ύ Auto-saved (safety score: 0.95):
   - auth/jwt_manager.py βœ…
   - auth/middleware.py βœ…
   - tests/test_auth.py βœ…

⚠️ Requires confirmation (system path):
   - /etc/auth.conf (blocked)

🎯 Use Cases

Good Fit (Early Adopters):

  • βœ… Experimenting with AI governance concepts
  • βœ… Research projects exploring LLM safety
  • βœ… Developers comfortable with alpha-quality software
  • βœ… Contributors who want to shape the direction
  • βœ… Teams learning from LLM failure patterns

Not Ready For:

  • ❌ Production environments requiring 99.9% uptime
  • ❌ Enterprise compliance (legal validation needed)
  • ❌ Multi-tool orchestration (only Claude works well)
  • ❌ Teams needing commercial support

πŸ“Š Current Project Status

Tests: 223 passing (includes core + optimization tests) Coverage: 75% Agents: 3 working (architect, coder, tester) + 10 basic stubs Adapters: 1 complete (Claude), 2 incomplete (Gemini, Codex) v2 Features: 5 adaptive learning capabilities (validated) Performance: Pre-warm cache optimized (99.9% faster)


πŸ—ΊοΈ Roadmap

See ROADMAP.md for detailed development plan.

Next Milestones:

  • v2.1 (4 weeks): Complete all 13 agents, finish adapters
  • v2.2 (8 weeks): Semantic matching, performance benchmarking
  • v3.0 (Future): Enterprise features, VS Code extension

Want to influence priorities? Open an issue or start a discussion!


πŸ“– Documentation


πŸ› οΈ Development

Setup Development Environment

# Install development dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run with coverage
pytest --cov=src/guardloop --cov-report=html

# Test specific components
pytest tests/core/test_task_classifier.py
pytest tests/core/test_pattern_analyzer.py

Project Structure

guardloop.dev/
β”œβ”€β”€ src/guardloop/
β”‚   β”œβ”€β”€ core/           # Core orchestration engine
β”‚   β”œβ”€β”€ adapters/       # LLM tool adapters (Claude, Gemini, etc.)
β”‚   └── utils/          # Shared utilities
β”œβ”€β”€ tests/              # Test suite (223 tests)
β”œβ”€β”€ docs/               # Documentation
└── ~/.guardloop/       # User configuration & data
    β”œβ”€β”€ config.yaml
    β”œβ”€β”€ guardloops/     # Static + dynamic rules
    └── data/           # SQLite database

🀝 Contributing

We're actively seeking contributors!

High-Impact Areas:

  1. 🚧 Complete Gemini/Codex adapters
  2. 🚧 Implement remaining 10 agents
  3. 🚧 Add semantic matching with embeddings
  4. πŸ§ͺ Write more tests and edge case coverage
  5. πŸ“š Improve documentation and examples

How to Contribute:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and add tests
  4. Commit with clear messages (git commit -m 'Add semantic matching')
  5. Push and open a Pull Request

See CONTRIBUTING.md for detailed guidelines.


🌟 Why This Matters

The Vision: AI governance that evolves with your team's actual usage patterns, not just theoretical rules.

Current State: Proof-of-concept validating the core hypothesis - yes, AI can learn from failures and improve governance automatically.

What's Next: Hardening for production, expanding beyond Claude, validating at scale.

Star ⭐ if the idea resonates. Contribute if you want to build it together.


πŸ“„ License

MIT License - see LICENSE file.


Built by developers, for developers. Shaped by the community.

Questions? Open an issue | Join discussions

About

Guardloop.dev is a self-improving AI governance system that learns from LLM mistakes, generates dynamic guardrails, and prevents repeated failures. It automatically enforces coding standards, security requirements, and compliance rules across Claude, Gemini, Codex, and other AI tools.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.4%
  • HTML 4.7%
  • CSS 3.0%
  • Shell 1.9%
  • JavaScript 1.0%