Skip to content

oericdacosta/meeting-analyst

Repository files navigation

Meeting Analyst

A Recursive Language Model (RLM) implementation for analyzing massive datasets with virtually unlimited context windows

This project demonstrates the Recursive Language Model (RLM) inference paradigm—a technique that allows LLMs to process datasets far exceeding their native context window limits by treating the LLM as a programmer rather than a reader.

📖 Table of Contents


What is RLM?

Recursive Language Models (RLMs) are not a new neural network architecture—they are an inference strategy that transforms how LLMs interact with large contexts.

The Problem with Traditional Approaches

Approach Limitation
Direct Context Model forgets details as context grows ("context rot")
RAG (Retrieval) May miss relevant data if semantic search fails
Summarization Loses granular details needed for complex analysis

The RLM Solution

Instead of feeding the entire dataset into the LLM's context window:

  1. Load data as a variable in a Python REPL environment
  2. Instruct the LLM to write code to programmatically explore the data
  3. Allow recursive delegation to cheaper/smaller LLMs for chunk processing
  4. Aggregate results through the code the LLM writes itself
Traditional: LLM ← [ENTIRE DATASET] ← User Question
RLM:         LLM ← [Code Environment + Variable Reference] ← User Question
             └── LLM writes code to read/analyze chunks as needed

Architecture Overview

flowchart TB
    subgraph User["👤 User"]
        Q[Query: Find opinion changes in meetings]
        D[(Dataset: 8.5 MB<br/>162 meetings)]
    end

    subgraph System["🔧 System Layer"]
        REPL["Python REPL Environment<br/><code>context</code> variable loaded"]
        EXEC["Code Executor<br/>Sandboxed Python"]
    end

    subgraph RootLM["🧠 Root LM (GPT-4o)"]
        THINK[Reasoning & Planning]
        CODE[Generate Python Code]
        FINAL[Compile Final Answer]
    end

    subgraph SubLM["⚡ Sub LM (GPT-4o-mini)"]
        READ[Read Data Chunks]
        EXTRACT[Extract Information]
        RETURN[Return Processed Results]
    end

    Q --> RootLM
    D --> REPL
    REPL --> |"context available"| EXEC
    
    THINK --> CODE
    CODE --> |"repl code block"| EXEC
    EXEC --> |"Executes code"| SubLM
    SubLM --> |"Extracted insights"| EXEC
    EXEC --> |"print output"| RootLM
    RootLM --> |"FINAL"| FINAL
    
    style RootLM fill:#4CAF50,color:#fff
    style SubLM fill:#2196F3,color:#fff
    style REPL fill:#FF9800,color:#fff
Loading

Key Components

Component Role Example Model
Root LM Orchestrator—plans strategy, writes code GPT-4o
Sub LM Worker—reads chunks, extracts information GPT-4o-mini
REPL Environment Executes code, manages data Python 3.12

How It Works: Step-by-Step

sequenceDiagram
    participant U as User
    participant S as System
    participant R as Root LM (GPT-4o)
    participant P as Python REPL
    participant Sub as Sub LM (GPT-4o-mini)

    U->>S: Query + Dataset (8.5 MB)
    S->>P: Load dataset as `context` variable
    S->>R: System Prompt + Query
    
    Note over R: "I need to explore the context first"
    
    R->>P: ```repl<br/>print(context[:5000])
    P-->>R: "=== Meeting 1 ===<br/>ID: Bdb001..."
    
    Note over R: "Ah, meetings are separated by '=== Meeting'"
    
    R->>P: ```repl<br/>meetings = context.split('=== Meeting')<br/>len(meetings)
    P-->>R: 162
    
    Note over R: "162 meetings! I'll analyze in batches using Sub LM"
    
    loop For each batch of meetings
        R->>P: ```repl<br/>for m in meetings[0:10]:<br/>  result = llm_query(f"Analyze: {m}")
        P->>Sub: "Analyze this meeting for opinion changes..."
        Sub-->>P: "Participant X changed opinion on topic Y..."
        P-->>R: [Analysis results printed]
    end
    
    R->>S: FINAL("Here are all opinion changes found...")
    S->>U: Final Answer
Loading

Detailed Flow Explanation

Step 1: Context Loading

The dataset is NOT sent to the Root LM. Instead:

# System writes data to temp file
context_path = "/tmp/repl_env_xxx/context.txt"
with open(context_path, "w") as f:
    f.write(dataset)  # 8.5 MB of meeting transcripts

# Then loads it into REPL namespace
exec("context = open(path).read()", globals)

Step 2: Root LM Explores

The Root LM receives a system prompt that says:

"You have access to a context variable. Use Python code to explore it."

The Root LM writes code like:

# Root LM's first move: understand the data structure
print(context[:5000])  # See first 5000 chars

Step 3: Strategic Chunking

After seeing the structure, the Root LM decides HOW to process:

# Root LM decides to split by meeting delimiter
meetings = context.split('=== Meeting')
print(f"Found {len(meetings)} meetings")

Step 4: Recursive Delegation

The Root LM uses llm_query() to send chunks to the Sub LM:

results = []
for i, meeting in enumerate(meetings[:10]):
    # This calls the Sub LM (GPT-4o-mini)
    analysis = llm_query(f"""
        Analyze this meeting transcript for opinion changes.
        Return: participant name, topic, initial position, final position.
        
        Transcript:
        {meeting}
    """)
    results.append(f"Meeting {i}: {analysis}")

Step 5: Final Answer

After processing, the Root LM compiles results:

FINAL("""
Based on my analysis of 162 meetings, I found these opinion changes:

1. **Grad C** (Meeting 1)
   - Topic: XML format for data representation
   - Initial: "XML might not be suitable for sub-word data"
   - Final: "We should explore ATLAS format for flexibility"
   
2. **PhD F** (Meeting 1)
   - Topic: Data format standards
   - Initial: Preferred standard formats only
   - Final: Open to custom formats if well-documented
...
""")

Project Structure

meeting_analyst/
├── main.py                 # Entry point
├── config_loader.py        # YAML configuration loader
├── data_loader.py          # Dataset downloader (HuggingFace)
└── rlm/
    ├── rlm.py              # Abstract base class
    ├── rlm_repl.py         # Root LM orchestrator
    ├── repl.py             # Python REPL environment + Sub LM
    ├── logger/             # Colorful terminal logging
    │   ├── root_logger.py
    │   └── repl_logger.py
    └── utils/
        ├── llm.py          # OpenAI API client with metrics
        ├── prompts.py      # System prompts and templates
        └── utils.py        # Code parsing and execution

File Responsibilities

File Purpose
rlm_repl.py Manages the Root LM conversation loop
repl.py Provides sandboxed Python execution + Sub LM
prompts.py Contains the critical system prompt that teaches RLM behavior
llm.py OpenAI client with token/latency tracking

Quick Start

Prerequisites

  • Python 3.12+
  • OpenAI API key
  • uv package manager (recommended)

Installation

# Clone the repository
git clone https://github.com/your-username/meeting-analyst.git
cd meeting-analyst

# Install dependencies
uv sync

# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

Running the Analysis

# Run the full RLM analysis
uv run python -m meeting_analyst.main

# Or test data extraction only (no API calls)
uv run python -m meeting_analyst.main --test-extraction

Expected Output

🔍 Log Detective 2000
============================================================

Running RLM Analysis
------------------------------------------------------------
Target data file: data/processed/qmsum.txt
Loading data into memory...
Data loaded (8,561,364 chars).

🔍 Query: Analyze the meeting transcripts. Find cases where a participant 
changed their opinion during the discussion...
--------------------------------------------------

================================================================================
STARTING NEW QUERY | 18:00:32
================================================================================

18:00:32 🔄 Calling API: gpt-4o | Prompt size: 4,464 chars
18:00:39 ✅ Responded in 6.87s | Model: gpt-4o
  📊 Tokens: Prompt: 984 | Completion: 201 | Total: 1,185

╭─── In [1]: ────────────────────────────────────────────────╮
│ print(context[:5000])                                      │
╰────────────────────────────────────────────────────────────╯
╭─── Out [1]: ───────────────────────────────────────────────╮
│ === Meeting 1 ===                                          │
│ ID: Bdb001                                                 │
│ Topic: Academic                                            │
│ ...                                                        │
╰────────────────────────────────────────────────────────────╯

Configuration

All settings are in config.yaml:

project:
  name: "Log Detective 2000"
  debug_mode: true

models:
  root_model: "gpt-4o"          # Smart orchestrator
  sub_model: "gpt-4o-mini"      # Fast/cheap worker

paths:
  data_file: "data/processed/qmsum.txt"

data_source:
  type: "huggingface"
  dataset_id: "Ahren09/QMSum"
  split: "train"

rlm_settings:
  max_iterations: 15            # Max Root LM thinking cycles
  enable_logging: true          # Show detailed execution logs

providers:
  openai:
    base_url: "https://api.openai.com/v1"

Model Selection Guide

Use Case Root LM Sub LM Cost/Speed
Best Quality gpt-4o gpt-4o $$$, Slow
Balanced gpt-4o gpt-4o-mini $$, Medium
Budget gpt-4o-mini gpt-4o-mini $, Fast

Understanding the Cost Model

Why RLM is Cost-Effective

pie title Token Distribution (30 meetings analyzed)
    "Root LM Orchestration" : 67334
    "Sub LM Processing" : 665220
Loading
Component Tokens Cost (USD)
Root LM (GPT-4o) ~67,000 ~$0.19
Sub LM (GPT-4o-mini) ~665,000 ~$0.10
Total ~732,000 ~$0.29

Key Insight: Delegation Saves Money

The Root LM (expensive) only:

  • Plans the analysis strategy
  • Writes code
  • Compiles final results

The Sub LM (cheap) handles:

  • Reading raw data chunks
  • Extracting specific information
  • 90% of total tokens

Example Output

After running analysis on 162 meeting transcripts (~8.5 MB):

Found Opinion Changes:

Meeting 1:
1. **Participant**: Grad C
   - **Topic**: Database format for annotations
   - **Initial**: "XML format won't work for sub-word data"
   - **Final**: "ATLAS format seems reasonable for flexibility"

2. **Participant**: PhD F
   - **Topic**: Data representation standards
   - **Initial**: Uncertain about non-standard formats
   - **Final**: Willing to explore ATLAS if it provides flexibility

Meeting 3:
1. **Participant**: Professor B
   - **Topic**: Project scope
   - **Initial**: Suggested broad exploration of complexities
   - **Final**: Narrowed focus to "tourists in Heidelberg" only
...

Technical Deep Dive

The System Prompt (The Magic)

The key to RLM behavior is the system prompt in prompts.py:

REPL_SYSTEM_PROMPT = """
You are tasked with answering a query with associated context. 
You can access, transform, and analyze this context interactively 
in a REPL environment that can recursively query sub-LLMs.

The REPL environment is initialized with:
1. A `context` variable containing the data
2. A `llm_query` function to query sub-LLMs
3. The ability to use `print()` to view outputs

When you want to execute Python code, wrap it in:
```repl
# Your code here

When finished, use FINAL(your answer) to provide the result. """


### The REPL Environment

The `REPLEnv` class in `repl.py` provides:

1. **Sandboxed Execution**: Only safe Python builtins allowed
2. **Context Injection**: Data loaded as `context` variable
3. **LLM Query Function**: `llm_query()` calls the Sub LM
4. **Output Capture**: Captures `print()` output for feedback

```python
class REPLEnv:
    def __init__(self, context_str, recursive_model):
        # Load context into temp file
        self.load_context(context_str)
        
        # Create safe globals with llm_query function
        self.globals['llm_query'] = lambda prompt: self.sub_rlm.completion(prompt)
        
    def code_execution(self, code) -> REPLResult:
        # Execute in sandboxed environment
        exec(code, self.globals, self.locals)
        return captured_output

Token Tracking

The OpenAIClient in llm.py tracks all API calls:

class OpenAIClient:
    def completion(self, messages):
        start = time.time()
        response = self.client.chat.completions.create(...)
        
        # Track metrics
        self.total_prompt_tokens += response.usage.prompt_tokens
        self.total_completion_tokens += response.usage.completion_tokens
        self.total_latency += time.time() - start

Key Learnings

  1. RLM is a paradigm, not a model - Works with any capable LLM (GPT, Claude, Gemini)

  2. Code quality matters - The Root LM must be good at coding for efficient analysis

  3. Hierarchical delegation is key - Use smart (expensive) for planning, cheap for reading

  4. Context rot is avoided - The Root LM never sees raw data, only processed results

  5. Linear complexity is acceptable - O(n) scanning is fine when precision matters


References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages