Meeting Analyst

A Recursive Language Model (RLM) implementation for analyzing massive datasets with virtually unlimited context windows

This project demonstrates the Recursive Language Model (RLM) inference paradigm—a technique that allows LLMs to process datasets far exceeding their native context window limits by treating the LLM as a programmer rather than a reader.

What is RLM?

Recursive Language Models (RLMs) are not a new neural network architecture—they are an inference strategy that transforms how LLMs interact with large contexts.

The Problem with Traditional Approaches

Approach	Limitation
Direct Context	Model forgets details as context grows ("context rot")
RAG (Retrieval)	May miss relevant data if semantic search fails
Summarization	Loses granular details needed for complex analysis

The RLM Solution

Instead of feeding the entire dataset into the LLM's context window:

Load data as a variable in a Python REPL environment
Instruct the LLM to write code to programmatically explore the data
Allow recursive delegation to cheaper/smaller LLMs for chunk processing
Aggregate results through the code the LLM writes itself

Traditional: LLM ← [ENTIRE DATASET] ← User Question
RLM:         LLM ← [Code Environment + Variable Reference] ← User Question
             └── LLM writes code to read/analyze chunks as needed

Architecture Overview

flowchart TB
    subgraph User["👤 User"]
        Q[Query: Find opinion changes in meetings]
        D[(Dataset: 8.5 MB<br/>162 meetings)]
    end

    subgraph System["🔧 System Layer"]
        REPL["Python REPL Environment<br/><code>context</code> variable loaded"]
        EXEC["Code Executor<br/>Sandboxed Python"]
    end

    subgraph RootLM["🧠 Root LM (GPT-4o)"]
        THINK[Reasoning & Planning]
        CODE[Generate Python Code]
        FINAL[Compile Final Answer]
    end

    subgraph SubLM["⚡ Sub LM (GPT-4o-mini)"]
        READ[Read Data Chunks]
        EXTRACT[Extract Information]
        RETURN[Return Processed Results]
    end

    Q --> RootLM
    D --> REPL
    REPL --> |"context available"| EXEC
    
    THINK --> CODE
    CODE --> |"repl code block"| EXEC
    EXEC --> |"Executes code"| SubLM
    SubLM --> |"Extracted insights"| EXEC
    EXEC --> |"print output"| RootLM
    RootLM --> |"FINAL"| FINAL
    
    style RootLM fill:#4CAF50,color:#fff
    style SubLM fill:#2196F3,color:#fff
    style REPL fill:#FF9800,color:#fff

Key Components

Component	Role	Example Model
Root LM	Orchestrator—plans strategy, writes code	GPT-4o
Sub LM	Worker—reads chunks, extracts information	GPT-4o-mini
REPL Environment	Executes code, manages data	Python 3.12

How It Works: Step-by-Step

sequenceDiagram
    participant U as User
    participant S as System
    participant R as Root LM (GPT-4o)
    participant P as Python REPL
    participant Sub as Sub LM (GPT-4o-mini)

    U->>S: Query + Dataset (8.5 MB)
    S->>P: Load dataset as `context` variable
    S->>R: System Prompt + Query
    
    Note over R: "I need to explore the context first"
    
    R->>P: ```repl<br/>print(context[:5000])
    P-->>R: "=== Meeting 1 ===<br/>ID: Bdb001..."
    
    Note over R: "Ah, meetings are separated by '=== Meeting'"
    
    R->>P: ```repl<br/>meetings = context.split('=== Meeting')<br/>len(meetings)
    P-->>R: 162
    
    Note over R: "162 meetings! I'll analyze in batches using Sub LM"
    
    loop For each batch of meetings
        R->>P: ```repl<br/>for m in meetings[0:10]:<br/>  result = llm_query(f"Analyze: {m}")
        P->>Sub: "Analyze this meeting for opinion changes..."
        Sub-->>P: "Participant X changed opinion on topic Y..."
        P-->>R: [Analysis results printed]
    end
    
    R->>S: FINAL("Here are all opinion changes found...")
    S->>U: Final Answer

Detailed Flow Explanation

Step 1: Context Loading

The dataset is NOT sent to the Root LM. Instead:

# System writes data to temp file
context_path = "/tmp/repl_env_xxx/context.txt"
with open(context_path, "w") as f:
    f.write(dataset)  # 8.5 MB of meeting transcripts

# Then loads it into REPL namespace
exec("context = open(path).read()", globals)

Step 2: Root LM Explores

The Root LM receives a system prompt that says:

"You have access to a context variable. Use Python code to explore it."

The Root LM writes code like:

# Root LM's first move: understand the data structure
print(context[:5000])  # See first 5000 chars

Step 3: Strategic Chunking

After seeing the structure, the Root LM decides HOW to process:

# Root LM decides to split by meeting delimiter
meetings = context.split('=== Meeting')
print(f"Found {len(meetings)} meetings")

Step 4: Recursive Delegation

The Root LM uses llm_query() to send chunks to the Sub LM:

results = []
for i, meeting in enumerate(meetings[:10]):
    # This calls the Sub LM (GPT-4o-mini)
    analysis = llm_query(f"""
        Analyze this meeting transcript for opinion changes.
        Return: participant name, topic, initial position, final position.
        
        Transcript:
        {meeting}
    """)
    results.append(f"Meeting {i}: {analysis}")

Step 5: Final Answer

After processing, the Root LM compiles results:

FINAL("""
Based on my analysis of 162 meetings, I found these opinion changes:

1. **Grad C** (Meeting 1)
   - Topic: XML format for data representation
   - Initial: "XML might not be suitable for sub-word data"
   - Final: "We should explore ATLAS format for flexibility"
   
2. **PhD F** (Meeting 1)
   - Topic: Data format standards
   - Initial: Preferred standard formats only
   - Final: Open to custom formats if well-documented
...
""")

Project Structure

meeting_analyst/
├── main.py                 # Entry point
├── config_loader.py        # YAML configuration loader
├── data_loader.py          # Dataset downloader (HuggingFace)
└── rlm/
    ├── rlm.py              # Abstract base class
    ├── rlm_repl.py         # Root LM orchestrator
    ├── repl.py             # Python REPL environment + Sub LM
    ├── logger/             # Colorful terminal logging
    │   ├── root_logger.py
    │   └── repl_logger.py
    └── utils/
        ├── llm.py          # OpenAI API client with metrics
        ├── prompts.py      # System prompts and templates
        └── utils.py        # Code parsing and execution

File Responsibilities

File	Purpose
`rlm_repl.py`	Manages the Root LM conversation loop
`repl.py`	Provides sandboxed Python execution + Sub LM
`prompts.py`	Contains the critical system prompt that teaches RLM behavior
`llm.py`	OpenAI client with token/latency tracking

Quick Start

Prerequisites

Python 3.12+
OpenAI API key
uv package manager (recommended)

Installation

# Clone the repository
git clone https://github.com/your-username/meeting-analyst.git
cd meeting-analyst

# Install dependencies
uv sync

# Configure environment
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY

Running the Analysis

# Run the full RLM analysis
uv run python -m meeting_analyst.main

# Or test data extraction only (no API calls)
uv run python -m meeting_analyst.main --test-extraction

Expected Output

🔍 Log Detective 2000
============================================================

Running RLM Analysis
------------------------------------------------------------
Target data file: data/processed/qmsum.txt
Loading data into memory...
Data loaded (8,561,364 chars).

🔍 Query: Analyze the meeting transcripts. Find cases where a participant 
changed their opinion during the discussion...
--------------------------------------------------

================================================================================
STARTING NEW QUERY | 18:00:32
================================================================================

18:00:32 🔄 Calling API: gpt-4o | Prompt size: 4,464 chars
18:00:39 ✅ Responded in 6.87s | Model: gpt-4o
  📊 Tokens: Prompt: 984 | Completion: 201 | Total: 1,185

╭─── In [1]: ────────────────────────────────────────────────╮
│ print(context[:5000])                                      │
╰────────────────────────────────────────────────────────────╯
╭─── Out [1]: ───────────────────────────────────────────────╮
│ === Meeting 1 ===                                          │
│ ID: Bdb001                                                 │
│ Topic: Academic                                            │
│ ...                                                        │
╰────────────────────────────────────────────────────────────╯

Configuration

All settings are in config.yaml:

project:
  name: "Log Detective 2000"
  debug_mode: true

models:
  root_model: "gpt-4o"          # Smart orchestrator
  sub_model: "gpt-4o-mini"      # Fast/cheap worker

paths:
  data_file: "data/processed/qmsum.txt"

data_source:
  type: "huggingface"
  dataset_id: "Ahren09/QMSum"
  split: "train"

rlm_settings:
  max_iterations: 15            # Max Root LM thinking cycles
  enable_logging: true          # Show detailed execution logs

providers:
  openai:
    base_url: "https://api.openai.com/v1"

Model Selection Guide

Use Case	Root LM	Sub LM	Cost/Speed
Best Quality	gpt-4o	gpt-4o	$$$, Slow
Balanced	gpt-4o	gpt-4o-mini	$$, Medium
Budget	gpt-4o-mini	gpt-4o-mini	$, Fast

Understanding the Cost Model

Why RLM is Cost-Effective

pie title Token Distribution (30 meetings analyzed)
    "Root LM Orchestration" : 67334
    "Sub LM Processing" : 665220

Component	Tokens	Cost (USD)
Root LM (GPT-4o)	~67,000	~$0.19
Sub LM (GPT-4o-mini)	~665,000	~$0.10
Total	~732,000	~$0.29

Key Insight: Delegation Saves Money

The Root LM (expensive) only:

Plans the analysis strategy
Writes code
Compiles final results

The Sub LM (cheap) handles:

Reading raw data chunks
Extracting specific information
90% of total tokens

Example Output

After running analysis on 162 meeting transcripts (~8.5 MB):

Found Opinion Changes:

Meeting 1:
1. **Participant**: Grad C
   - **Topic**: Database format for annotations
   - **Initial**: "XML format won't work for sub-word data"
   - **Final**: "ATLAS format seems reasonable for flexibility"

2. **Participant**: PhD F
   - **Topic**: Data representation standards
   - **Initial**: Uncertain about non-standard formats
   - **Final**: Willing to explore ATLAS if it provides flexibility

Meeting 3:
1. **Participant**: Professor B
   - **Topic**: Project scope
   - **Initial**: Suggested broad exploration of complexities
   - **Final**: Narrowed focus to "tourists in Heidelberg" only
...

Technical Deep Dive

The System Prompt (The Magic)

The key to RLM behavior is the system prompt in prompts.py:

REPL_SYSTEM_PROMPT = """
You are tasked with answering a query with associated context. 
You can access, transform, and analyze this context interactively 
in a REPL environment that can recursively query sub-LLMs.

The REPL environment is initialized with:
1. A `context` variable containing the data
2. A `llm_query` function to query sub-LLMs
3. The ability to use `print()` to view outputs

When you want to execute Python code, wrap it in:
```repl
# Your code here

When finished, use FINAL(your answer) to provide the result. """


### The REPL Environment

The `REPLEnv` class in `repl.py` provides:

1. **Sandboxed Execution**: Only safe Python builtins allowed
2. **Context Injection**: Data loaded as `context` variable
3. **LLM Query Function**: `llm_query()` calls the Sub LM
4. **Output Capture**: Captures `print()` output for feedback

```python
class REPLEnv:
    def __init__(self, context_str, recursive_model):
        # Load context into temp file
        self.load_context(context_str)
        
        # Create safe globals with llm_query function
        self.globals['llm_query'] = lambda prompt: self.sub_rlm.completion(prompt)
        
    def code_execution(self, code) -> REPLResult:
        # Execute in sandboxed environment
        exec(code, self.globals, self.locals)
        return captured_output

Token Tracking

The OpenAIClient in llm.py tracks all API calls:

class OpenAIClient:
    def completion(self, messages):
        start = time.time()
        response = self.client.chat.completions.create(...)
        
        # Track metrics
        self.total_prompt_tokens += response.usage.prompt_tokens
        self.total_completion_tokens += response.usage.completion_tokens
        self.total_latency += time.time() - start

Key Learnings

RLM is a paradigm, not a model - Works with any capable LLM (GPT, Claude, Gemini)
Code quality matters - The Root LM must be good at coding for efficient analysis
Hierarchical delegation is key - Use smart (expensive) for planning, cheap for reading
Context rot is avoided - The Root LM never sees raw data, only processed results
Linear complexity is acceptable - O(n) scanning is fine when precision matters

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
src/meeting_analyst		src/meeting_analyst
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Meeting Analyst

📖 Table of Contents

What is RLM?

The Problem with Traditional Approaches

The RLM Solution

Architecture Overview

Key Components

How It Works: Step-by-Step

Detailed Flow Explanation

Step 1: Context Loading

Step 2: Root LM Explores

Step 3: Strategic Chunking

Step 4: Recursive Delegation

Step 5: Final Answer

Project Structure

File Responsibilities

Quick Start

Prerequisites

Installation

Running the Analysis

Expected Output

Configuration

Model Selection Guide

Understanding the Cost Model

Why RLM is Cost-Effective

Key Insight: Delegation Saves Money

Example Output

Technical Deep Dive

The System Prompt (The Magic)

Token Tracking

Key Learnings

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages