- Python 3.10+
- Claude CLI installed and configured
- (Optional) TensorBoard for visualization
- (Optional) Firestore for production storage
cd life
pip install -r requirements.txt # If exists# Single debate test
python test_arena.py
# Full training test
python test_training.py
# Hierarchical staking test (legacy)
python test_hierarchical.pylife/
├── world_model/
│ ├── __init__.py # Main exports
│ ├── models/
│ │ ├── __init__.py
│ │ ├── observation.py # Observation, ObservationStore
│ │ ├── agent.py # Agent, AgentSet, Tendency
│ │ ├── tree.py # Tree, Node, Position, Stake
│ │ ├── deviation.py # Legacy deviation model
│ │ └── evidence.py # Legacy evidence model
│ ├── extraction/
│ │ ├── __init__.py
│ │ ├── extractor.py # Legacy extractor
│ │ └── observation_extractor.py # Claude-based extraction
│ ├── staking/
│ │ ├── __init__.py
│ │ ├── staker.py # Legacy single staker
│ │ └── hierarchical_staker.py # Legacy hierarchical
│ ├── dynamics/
│ │ ├── __init__.py
│ │ ├── arena.py # Adversarial debate orchestration
│ │ └── trainer.py # ML-style training loop
│ └── storage/
│ ├── __init__.py
│ ├── graph.py # Legacy graph storage
│ ├── world_model_store.py # JSON persistence
│ └── firestore_adapter.py # Firestore persistence
├── api/
│ ├── __init__.py
│ └── main.py # FastAPI service
├── docs/ # Documentation
├── observations.json # Sample data
└── test_*.py # Test scripts
- Add to
Tendencyenum inmodels/agent.py:
class Tendency(Enum):
...
NEW_TENDENCY = "new_tendency"- Add default allocation in
DEFAULT_ALLOCATIONS:
DEFAULT_ALLOCATIONS = {
...
Tendency.NEW_TENDENCY: 0.10,
}- Update prompt in
dynamics/arena.pyproposal phase
Core formula in models/tree.py:
@property
def score(self) -> float:
direct = sum(self.stakes.values())
pro_sum = sum(c.score for c in self.pro_children)
con_sum = sum(c.score for c in self.con_children)
return direct + pro_sum - con_sumModify for different propagation strategies (e.g., decay, normalization).
- Create class implementing the logger interface:
class MyLogger:
def log_epoch(self, metrics: EpochMetrics):
# Log epoch data
def log_validation(self, result: ValidationResult):
# Log validation data
def log_config(self, config: TrainConfig):
# Log configuration
def finish(self):
# Cleanup- Use in training:
trainer.train(..., logger=MyLogger())- Implement the adapter pattern:
class MyStorageAdapter:
async def save_world_model(self, model: WorldModel):
...
async def load_world_model(self, model_id: str) -> WorldModel:
...
async def update_observations(self, model_id: str, obs: ObservationStore):
...
async def update_agents(self, model_id: str, agents: AgentSet):
...- See
storage/firestore_adapter.pyfor reference.
The Arena uses Claude CLI for semantic analysis:
def _call_claude(self, prompt: str, timeout: int = 300) -> str:
# Write prompt to temp file
prompt_file = self.work_dir / f"prompt_{hash(prompt)}.txt"
prompt_file.write_text(prompt)
# Call Claude
if os.name == 'nt': # Windows
cmd = f'type "{prompt_file}" | claude -p --dangerously-skip-permissions'
result = subprocess.run(cmd, shell=True, ...)
else: # Unix
with open(prompt_file) as f:
result = subprocess.run(
['claude', '-p', '--dangerously-skip-permissions'],
stdin=f, ...
)
return result.stdoutPrompts are in dynamics/arena.py. Key patterns:
- Clear role: "You are simulating adversarial staking..."
- Structured context: Claims, observations formatted clearly
- JSON output: Request specific JSON schema
- Examples: Show expected format
try:
response = self._call_claude(prompt, timeout=120)
data = self._parse_json(response)
except subprocess.TimeoutExpired:
# Retry or skip batch
except json.JSONDecodeError:
# Retry or use fallbackdef test_weight_propagation():
tree = Tree(root_value="Test")
node = Node(content="Evidence", tree_id=tree.id)
node.add_stake("meaning", 0.5)
tree.add_node(tree.root_node.id, node, Position.PRO)
assert tree.score == 0.5def test_full_debate():
store = ObservationStore()
# Add observations...
agents = AgentSet.with_defaults()
arena = Arena()
trees, result = arena.run_full_debate(store, agents)
assert result.winner is not None
assert sum(result.scores.values()) > 0def test_validation_accuracy():
# Train on subset
# Validate on held-out
# Assert accuracy > baselineEach epoch makes multiple Claude calls:
- 1 for proposal phase
- N for staking phase (batch_size=20 -> ceil(obs/20) calls)
- 1 for validation (if enabled)
Optimize by:
- Increasing batch size (more tokens per call)
- Reducing epochs
- Caching proposals across epochs
Large observation stores can consume memory:
- ObservationStore holds all observations in memory
- TreeStore holds all trees with nodes
Consider streaming for very large datasets.
Currently sequential. Potential parallelization:
- Staking batches could run in parallel
- Multiple epochs could checkpoint and resume
Windows encoding issue. Solution:
prompt_file.write_text(prompt, encoding="utf-8")Parsing error from Claude response. Check:
- JSON format in response
- Exact field names match expectations
- Add fallback handling
Increase timeout or reduce batch size:
result = subprocess.run(cmd, timeout=300) # 5 minutesLower learning rate or increase patience:
config = TrainConfig(
initial_lr=0.1,
patience=5,
convergence_threshold=0.02,
)Track allocation changes over time:
@dataclass
class AllocationSnapshot:
timestamp: datetime
allocations: dict[Tendency, float]
trigger: str # "debate", "new_observations", etc.Different allocations for different contexts:
class ContextualAgentSet:
contexts: dict[str, AgentSet] # "work", "family", "creative"
def get_for_context(self, context: str) -> AgentSet:
return self.contexts.get(context, self.default)Use embeddings for faster relevance scoring:
def relevance_score(obs_embedding, tree_embedding) -> float:
return cosine_similarity(obs_embedding, tree_embedding)Compare world models across people:
def similarity(model_a: WorldModel, model_b: WorldModel) -> float:
# Compare allocation distributions
# Compare tree structures
# Compute divergence score