warrofua · warrofua · Nov 16, 2025 · Nov 16, 2025
diff --git a/.gitignore b/.gitignore
@@ -129,3 +129,4 @@ Thumbs.db
 ~*
 /runs
 DocLayNet-base/
+datasets/
diff --git a/README.md b/README.md
@@ -52,10 +52,10 @@ A control loop that probes what the model remembers, tunes bottleneck budgets, r
 
 #### Operating the Auto‑IB Orchestrator
 
-* Wrap each STM append in a :class:`UsageEvent` that can carry the compressed tensor **and** a :class:`CompressionRecord`. Telemetry such as selected indices, token counts, and IB metrics are captured under ``metadata["compression"]`` and mirrored in the STM index.
+* Wrap each STM append in a :class:`UsageEvent` that can carry the compressed tensor **and** a :class:`CompressionRecord`. Telemetry such as selected indices, token counts, IB/Ml lower bounds, constraint verdicts, and canonical cell artefacts are captured under ``metadata["compression"]`` and mirrored in the STM index.
 * Call :func:`Orchestrator.tune_budget` periodically (e.g., after processing a batch). The default :class:`CompressionRatioBudgetStrategy` looks at the recent compression ratios and adjusts ``Orchestrator.config.target_budget`` upward when quality drops or downward when utilisation saturates. Inspect ``Orchestrator.budget_history`` to audit the decisions.
 * Trigger :func:`Orchestrator.run_retention_probe` on a schedule to sample STM entries, reconstruct them with the stored :class:`~nd_llm.bottleneck.ib.IBottleneck` telemetry, and monitor reconstruction quality / drift. Any missing or malformed telemetry is surfaced under ``probe["issues"]``.
-* Use the new STM query helpers such as :func:`STM.query` or :func:`STM.list_by_alignment` to fetch aligned batches of entries (e.g., all shards for a ``session_id``) without loading every payload.
+* Use the new STM query helpers such as :func:`STM.query` or :func:`STM.list_by_alignment`, plus the holographic superposition channels (``write_superposition``/``read_superposition``) to fetch aligned batches of entries or aggregate long-horizon fingerprints without loading every payload.
 
 ---
 
@@ -175,6 +175,7 @@ usage_key = orchestrator.log_usage_event(
 ### Bottleneck tuning knobs
 
 * **Objective / scoring:** pass `objective="l2-norm"` (default) for magnitude gating or `objective="query-dot"` to enable the built-in query-conditioned scorer. You can also inject your own scorer via the `scorer` argument; it receives `(field, embeddings, metadata, context)` and should return a score per token.
+* **Mutual-information blending:** supply an `MIProxy` + `mi_targets` via :func:`build_mi_proxy_context` and set `mi_score_weight` to trade off between the base scorer and per-token MI similarities.
 * **Query context:** provide query embeddings or other conditioning signals through the `context` mapping (e.g. `{"query_embedding": vector}`) and they will be forwarded to the scoring strategy.
 * **Budget allocator:** override `budget_allocator` to customize per-field sub-budgets. The default `RegistryAwareBudgetAllocator` inspects registry metadata (salience flags, alignment keys, optional `budget_weight`) and records the resulting `field_budgets` and `allocation_weights` in `CompressionTelemetry`.
 * **Metrics:** every call to `compress` returns a `CompressionResult.metrics` dictionary with IB/RD proxies such as `ib_proxy`, `rd_proxy`, and an `embedding_reconstruction_error` computed from kept vs. dropped embeddings.
@@ -203,58 +204,68 @@ affinity:
 
 ## Using real datasets
 
-Fetch the official FUNSD release and the compact DocLayNet-base snapshot with the
-helper script.  By
-default datasets are placed in ``~/.cache/n-dimensional-llm``; override the
-location via ``ND_LLM_DATA_CACHE`` or ``--cache-dir`` if required.
+The [CORD receipt dataset](https://huggingface.co/datasets/naver-clova-ix/cord-v2) is wired into the benchmark harness for a realistic document-understanding task. Install the optional dependency stack (``pip install .[benchmarks]`` or the explicit packages below) when you want the full dataset instead of the bundled JSONL sample:
 
 ```bash
-python scripts/download_datasets.py
+pip install datasets pillow
 ```
 
-The FUNSD benchmark helper consumes the extracted ``funsd`` directory.  Set
-``dataset_size=0`` to keep the full corpus and ``use_sample=False`` to disable
-the bundled JSON sample.
-
 ```bash
 python - <<'PY'
-from pathlib import Path
-
-from benchmarks.doc_understanding import run_funsd_benchmark
-
-cache = Path.home() / ".cache" / "n-dimensional-llm"
-report = run_funsd_benchmark(
-    budget_values=(8, 12, 16),
-    data_root=cache / "funsd",
-    dataset_size=0,
-    use_sample=False,
+from benchmarks.doc_understanding import run_cord_benchmark
+
+report = run_cord_benchmark(
+    budget_values=(4, 8, 12),
+    dataset_size=8,
+    use_sample=True,  # flip to False to stream the HF split or use a local directory
+    data_root="datasets",  # automatically combines subdirectories named CORD*
+    threshold=250_000,
 )
 print(report["budgets"][0]["metrics"])  # inspect results
 PY
 ```
 
-DocLayNet follows the same convention, reading from the ``doclaynet`` directory
-inside the cache.  The helper uses the
-`pierreguillou/DocLayNet-base <https://huggingface.co/datasets/pierreguillou/DocLayNet-base>`_
-mirror hosted on Hugging Face instead of the much larger full corpus.
+Set ``data_root`` to the directory that contains the official ``train/dev/test/json`` folders (or to a parent folder that has subdirectories named ``CORD*``—the loader will merge them) once you download the [CORD release](https://github.com/clovaai/cord).
 
-```bash
-python - <<'PY'
-from pathlib import Path
+### ChartQA field benchmark
 
-from benchmarks.doc_understanding import run_doclaynet_benchmark
+ChartQA-style chart reasoning now has a lightweight harness that exercises question text alongside structured chart metadata. The default configuration consumes the bundled sample; point it at the official dataset (e.g. [lmms-lab/chartqa](https://huggingface.co/lmms-lab/chartqa) or the [GitHub release](https://github.com/IBM/chartqa)) when you want full coverage:
 
-cache = Path.home() / ".cache" / "n-dimensional-llm"
-report = run_doclaynet_benchmark(
-    budget_values=(6, 12, 18),
-    data_root=cache / "doclaynet",
-    dataset_size=0,
-    use_sample=False,
+```python
+from benchmarks.chartqa import run_chartqa_benchmark
+
+report = run_chartqa_benchmark(
+    budget_values=(2, 4, 6),
+    dataset_size=4,
+    use_sample=True,  # set False once you've downloaded the dataset locally
 )
-print(report["budgets"][0]["metrics"])  # inspect results
-PY
+print(report)
 ```
 
+### Rate–distortion & Fano audits
+
+Use the `scripts/rd_audit.py` CLI to sweep token budgets for the CORD benchmark in both N-D and text-only configurations, then compute empirical rate–distortion curves and Fano-consistent error bounds:
+
+```bash
+python -m scripts.rd_audit --budgets 4 8 12 --dataset-size 8 --use-sample
+```
+
+Both modes record the mean mutual-information lower bound (from the MI proxy) alongside each budget’s accuracy/distortion so you can visualise the dominance of N-D inputs at fixed rate.
+
+### Local LLM harness (Ollama)
+
+If you have [Ollama](https://ollama.com) with `llama3.1:8b` installed locally, you can replay compressed field summaries into the model for qualitative checks:
+
+```bash
+python -m scripts.ollama_harness \
+  --dataset cord \
+  --data-root datasets \
+  --use-sample \
+  --dry-run
+```
+
+Drop `--dry-run` to stream the prompt to your Ollama instance (`http://127.0.0.1:11434` by default). ChartQA prompts are supported as well (`--dataset chartqa`).
+
 ### Runnable multi-field invoice demo
 
 Kick the tyres with the maintained invoice walk-through that wires the registry, stub encoders, bottleneck, STM, and orchestrator together:

diff --git a/benchmarks/__init__.py b/benchmarks/__init__.py
@@ -7,29 +7,25 @@
 
 __all__ = [
     "run_benchmark",
-    "run_doclaynet_benchmark",
-    "run_funsd_benchmark",
+    "run_cord_benchmark",
+    "run_chartqa_benchmark",
     "run_long_qa_benchmark",
     "run_video_qa_benchmark",
-    "build_doclaynet_registry",
-    "build_doclaynet_encoders",
-    "doclaynet_fields",
-    "doclaynet_contains_table",
-    "load_doclaynet_dataset",
+    "build_cord_registry",
+    "build_cord_encoders",
+    "cord_fields",
+    "cord_high_total_label",
+    "cord_total_amount",
+    "load_cord_dataset",
+    "build_chartqa_registry",
+    "build_chartqa_encoders",
+    "chartqa_fields",
+    "chartqa_answer",
+    "load_chartqa_dataset",
     "AmountEncoder",
     "build_invoice_encoders",
     "build_invoice_registry",
-    "build_funsd_encoders",
-    "build_funsd_registry",
-    "funsd_fields",
-    "funsd_numeric_answer_label",
     "invoice_fields",
-    "load_funsd_dataset",
-    "build_doclaynet_encoders",
-    "build_doclaynet_registry",
-    "doclaynet_fields",
-    "doclaynet_contains_table",
-    "load_doclaynet_dataset",
     "synthetic_invoice",
     "synthetic_invoice_dataset",
     "build_longqa_registry",
@@ -52,14 +48,33 @@
 def __getattr__(name: str) -> Any:  # pragma: no cover - thin convenience wrapper
     if name == "run_benchmark":
         return import_module("benchmarks.doc_understanding").run_benchmark
-    if name == "run_doclaynet_benchmark":
-        return import_module("benchmarks.doc_understanding").run_doclaynet_benchmark
-    if name == "run_funsd_benchmark":
-        return import_module("benchmarks.doc_understanding").run_funsd_benchmark
+    if name == "run_cord_benchmark":
+        return import_module("benchmarks.doc_understanding").run_cord_benchmark
+    if name == "run_chartqa_benchmark":
+        return import_module("benchmarks.chartqa").run_chartqa_benchmark
     if name == "run_long_qa_benchmark":
         return import_module("benchmarks.long_qa").run_long_qa_benchmark
     if name == "run_video_qa_benchmark":
         return import_module("benchmarks.video_qa").run_video_qa_benchmark
+    if name in {
+        "build_cord_registry",
+        "build_cord_encoders",
+        "cord_fields",
+        "cord_high_total_label",
+        "cord_total_amount",
+        "load_cord_dataset",
+    }:
+        module = import_module("benchmarks.cord")
+        return getattr(module, name)
+    if name in {
+        "build_chartqa_registry",
+        "build_chartqa_encoders",
+        "chartqa_fields",
+        "chartqa_answer",
+        "load_chartqa_dataset",
+    }:
+        module = import_module("benchmarks.chartqa")
+        return getattr(module, name)
     if name in {
         "AmountEncoder",
         "build_invoice_encoders",
@@ -70,33 +85,6 @@ def __getattr__(name: str) -> Any:  # pragma: no cover - thin convenience wrappe
     }:
         module = import_module("benchmarks.synthetic")
         return getattr(module, name)
-    if name in {
-        "build_doclaynet_registry",
-        "build_doclaynet_encoders",
-        "doclaynet_fields",
-        "doclaynet_contains_table",
-        "load_doclaynet_dataset",
-    }:
-        module = import_module("benchmarks.doclaynet")
-        return getattr(module, name)
-    if name in {
-        "build_funsd_encoders",
-        "build_funsd_registry",
-        "funsd_fields",
-        "funsd_numeric_answer_label",
-        "load_funsd_dataset",
-    }:
-        module = import_module("benchmarks.funsd")
-        return getattr(module, name)
-    if name in {
-        "build_doclaynet_encoders",
-        "build_doclaynet_registry",
-        "doclaynet_fields",
-        "doclaynet_contains_table",
-        "load_doclaynet_dataset",
-    }:
-        module = import_module("benchmarks.doclaynet")
-        return getattr(module, name)
     if name in {
         "build_longqa_registry",
         "build_longqa_encoders",
-Original file line number
+Diff line change
@@ Expand Up / @@ -129,3 +129,4 @@ Thumbs.db @@
     ~*
     /runs
     DocLayNet-base/
+    datasets/