Skip to content

jane-jhu/trace_collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ContextBench LEANN Runner

This directory keeps a small local runner around the upstream ContextBench repo.

Kept Files

  • contextbench_official_repo/: upstream ContextBench code and data.
  • scripts/*.py: local preparation, run, and evaluation scripts.
  • mitmproxy_addons/trace_recorder.py: HTTP trace recorder used while Claude runs.
  • requirements-run.txt: extra Python dependencies for these local scripts.

Generated directories such as .venv/, .mitmproxy-venv/, traces/, logs/, scripts/contextbench_work_dir_*, and scripts/contextbench_eval_repos/ can be deleted and regenerated.

1. Create Python Environment

Run from this directory:

python3.11 -m venv .venv
source .venv/bin/activate
pip install -r contextbench_official_repo/requirements.txt
pip install -r requirements-run.txt

2. Install Runtime CLIs

Install LEANN:

uv tool install leann-core --with leann

Install mitmdump in a separate environment:

python3.11 -m venv .mitmproxy-venv
.mitmproxy-venv/bin/python -m pip install mitmproxy

The run script also expects:

  • claude CLI available on PATH.
  • Node/npm available for npx ccusage.
  • A Claude login session or ANTHROPIC_API_KEY in the environment.
  • If using LEANN MCP mode, a Claude MCP server named leann-server or LEANN_MCP_SERVER/CLAUDE_MCP_CONFIG_PATH configured accordingly.

3. Prepare Repos And LEANN Indexes

cd scripts
WORK_ROOT=contextbench_work_dir_claude python prepare_repos_with_leann.py

Useful overrides:

SELECTED_IDS=id1,id2 WORK_ROOT=contextbench_work_dir_claude python prepare_repos_with_leann.py
BENCH_FILTER=Pro WORK_ROOT=contextbench_work_dir_claude python prepare_repos_with_leann.py
LEANN_AST_CHUNK_SIZE=600 LEANN_AST_CHUNK_OVERLAP=96 python prepare_repos_with_leann.py

4. Run Selected Tasks

cd scripts
LEANN_ENABLED=1 \
WORK_ROOT=contextbench_work_dir_claude \
OUTPUT_FILE=all_predictions_claude.jsonl \
python batch_run_selected.py

Run without LEANN:

LEANN_ENABLED=0 \
WORK_ROOT=contextbench_work_dir_claude \
OUTPUT_FILE=all_predictions_claude_baseline.jsonl \
python batch_run_selected.py

Run specific IDs without editing the script:

SELECTED_IDS=id1,id2 python batch_run_selected.py

5. Evaluate Results

Context retrieval metrics:

cd scripts
python evaluate_run.py \
  --predictions all_predictions_claude.jsonl \
  --metrics task_metrics.jsonl \
  --output-json eval_report.json

Patch-based proxy accuracy:

python evaluate_contextbench_accuracy.py \
  --predictions all_predictions_claude.jsonl \
  --metrics task_metrics.jsonl \
  --output-json contextbench_accuracy_report.json

6. Clean Generated Files

rm -rf .venv .mitmproxy-venv .eval-venv .leann .pycache_tmp logs traces
rm -rf scripts/.leann scripts/scripts
rm -rf scripts/contextbench_eval_repos scripts/contextbench_work_dir_claude scripts/contextbench_work_dir_claude_overlap160

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages