Skip to content

Implement Response Caching System #21

@FradSer

Description

@FradSer

Performance Enhancement: Agent Response Caching

Type: Enhancement
Impact: Significant response time improvement for similar thoughts
Location: System-wide caching layer

Problem Description

Similar thoughts trigger identical agent processing, leading to unnecessary computation and API calls. The system lacks any caching mechanism for agent responses.

Performance Impact

  • Redundant Processing: Same thought types processed repeatedly
  • API Cost: Unnecessary LLM API calls for similar inputs
  • Response Time: No benefit from previous similar analyses
  • Resource Usage: Wasted computational resources

Proposed Caching Strategy

1. Semantic Caching

import hashlib
from typing import Dict, Optional, Tuple
from datetime import datetime, timedelta

class SemanticCache:
    def __init__(self, ttl_hours: int = 24):
        self.cache: Dict[str, Tuple[str, datetime]] = {}
        self.ttl = timedelta(hours=ttl_hours)
    
    def get_cache_key(self, thought_content: str, agent_type: str) -> str:
        """Generate semantic cache key."""
        # Normalize thought content for better cache hits
        normalized = thought_content.lower().strip()
        content_hash = hashlib.md5(normalized.encode()).hexdigest()[:8]
        return f"{agent_type}:{content_hash}"
    
    def get(self, thought_content: str, agent_type: str) -> Optional[str]:
        """Get cached response if available and not expired."""
        key = self.get_cache_key(thought_content, agent_type)
        if key in self.cache:
            response, timestamp = self.cache[key]
            if datetime.now() - timestamp < self.ttl:
                return response
            else:
                del self.cache[key]  # Remove expired
        return None
    
    def set(self, thought_content: str, agent_type: str, response: str):
        """Cache agent response."""
        key = self.get_cache_key(thought_content, agent_type)
        self.cache[key] = (response, datetime.now())

2. Integration with Agent System

class CachedAgentTeam:
    def __init__(self, team: Team, cache: SemanticCache):
        self.team = team
        self.cache = cache
    
    async def process_with_cache(self, thought: ThoughtData) -> str:
        """Process thought with caching support."""
        # Check cache first
        cache_key = f"{thought.thoughtText}:{thought.thoughtType}"
        cached_result = self.cache.get(cache_key, "coordinator")
        
        if cached_result:
            logger.info("Cache hit - returning cached response")
            return cached_result
        
        # Process normally and cache result
        result = await self.team.run(thought.thoughtText)
        self.cache.set(cache_key, "coordinator", result.content)
        
        return result.content

Cache Strategy Details

  • TTL: 24 hours for thought responses
  • Storage: In-memory with optional Redis backend for production
  • Invalidation: Time-based expiration with manual purge capability
  • Hit Rate Target: >30% cache hit rate for similar thoughts

Performance Benefits

  • Response Time: Sub-second responses for cached thoughts
  • Cost Savings: 30-50% reduction in LLM API costs
  • Scalability: Better handling of repeated similar requests
  • User Experience: Instant responses for common thought patterns

Acceptance Criteria

  • Implement semantic caching system with configurable TTL
  • Integrate caching with agent coordination workflow
  • Add cache hit/miss metrics and monitoring
  • Achieve >30% cache hit rate in typical usage
  • Add cache management (clear, stats, health check)
  • Optional Redis backend for distributed caching

Priority: Medium - Cost and performance optimization

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions