Skip to content

Commit bd2b55d

Browse files
authored
Merge pull request #310 from igerber/llm-guide-api
Bundle LLM guide files in wheel with get_llm_guide() accessor
2 parents 3b1fe6b + 89ee337 commit bd2b55d

14 files changed

Lines changed: 200 additions & 43 deletions

File tree

.claude/commands/bump-version.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Files that need updating:
2424
| `pyproject.toml` | `version = "X.Y.Z"` | ~7 |
2525
| `rust/Cargo.toml` | `version = "X.Y.Z"` | ~3 |
2626
| `CHANGELOG.md` | Section header + comparison link | Top + bottom |
27-
| `docs/llms-full.txt` | `- Version: X.Y.Z` | ~5 |
27+
| `diff_diff/guides/llms-full.txt` | `- Version: X.Y.Z` | ~5 |
2828

2929
## Instructions
3030

@@ -80,7 +80,7 @@ Files that need updating:
8080
Replace `version = "OLD_VERSION"` (the first version line under [package]) with `version = "NEW_VERSION"`
8181
Note: Rust version may differ from Python version; always sync to the new version
8282

83-
- `docs/llms-full.txt`:
83+
- `diff_diff/guides/llms-full.txt`:
8484
Replace `- Version: OLD_VERSION` with `- Version: NEW_VERSION`
8585

8686
6. **Update CHANGELOG comparison links**:
@@ -101,7 +101,7 @@ Files that need updating:
101101
- diff_diff/__init__.py: __version__ = "NEW_VERSION"
102102
- pyproject.toml: version = "NEW_VERSION"
103103
- rust/Cargo.toml: version = "NEW_VERSION"
104-
- docs/llms-full.txt: Version: NEW_VERSION
104+
- diff_diff/guides/llms-full.txt: Version: NEW_VERSION
105105
- CHANGELOG.md: Added/verified [NEW_VERSION] entry
106106

107107
Next steps:

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ category (`Methodology/Correctness`, `Performance`, or `Testing/Docs`):
139139
| `CONTRIBUTING.md` | Documentation requirements, test writing guidelines |
140140
| `.claude/commands/dev-checklists.md` | Checklists for params, methodology, warnings, reviews, bugs (run `/dev-checklists`) |
141141
| `.claude/memory.md` | Debugging patterns, tolerances, API conventions (git-tracked) |
142-
| `docs/llms-practitioner.txt` | Baker et al. (2025) 8-step practitioner workflow for AI agents |
142+
| `diff_diff/guides/llms-practitioner.txt` | Baker et al. (2025) 8-step practitioner workflow for AI agents (accessible at runtime via `diff_diff.get_llm_guide("practitioner")`) |
143143
| `docs/performance-plan.md` | Performance optimization details |
144144
| `docs/benchmarks.rst` | Validation results vs R |
145145

README.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,11 +69,19 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
6969

7070
## For AI Agents
7171

72-
If you are an AI agent or LLM using this library, read [`docs/llms.txt`](docs/llms.txt) for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis — not just calling `fit()`, but testing assumptions, running sensitivity analysis, and checking robustness.
72+
If you are an AI agent or LLM using this library, call `diff_diff.get_llm_guide()` for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis — not just calling `fit()`, but testing assumptions, running sensitivity analysis, and checking robustness.
7373

74-
After estimation, call `practitioner_next_steps(results)` for context-aware guidance on remaining diagnostic steps.
74+
```python
75+
from diff_diff import get_llm_guide
76+
77+
get_llm_guide() # concise API reference
78+
get_llm_guide("practitioner") # 8-step workflow (Baker et al. 2025)
79+
get_llm_guide("full") # comprehensive documentation
80+
```
81+
82+
The guides are bundled in the wheel, so they are accessible from a `pip install` with no network access required.
7583

76-
Detailed guide: [`docs/llms-practitioner.txt`](docs/llms-practitioner.txt)
84+
After estimation, call `practitioner_next_steps(results)` for context-aware guidance on remaining diagnostic steps.
7785

7886
## For Data Scientists
7987

diff_diff/__init__.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,14 @@
44
This library provides sklearn-like estimators for causal inference
55
using the difference-in-differences methodology.
66
7-
For rigorous analysis, follow the 8-step practitioner workflow in
8-
docs/llms-practitioner.txt (based on Baker et al. 2025). After
9-
estimation, call ``practitioner_next_steps(results)`` for context-aware
10-
guidance on remaining diagnostic steps.
7+
For rigorous analysis, follow the 8-step practitioner workflow based
8+
on Baker et al. (2025). After estimation, call
9+
``practitioner_next_steps(results)`` for context-aware guidance on
10+
remaining diagnostic steps.
1111
12-
AI agent reference: docs/llms.txt
12+
AI agents: call ``diff_diff.get_llm_guide()`` for a complete API reference.
13+
Use ``get_llm_guide("practitioner")`` for the 8-step workflow or
14+
``get_llm_guide("full")`` for comprehensive documentation.
1315
"""
1416

1517
# Import backend detection from dedicated module (avoids circular imports)
@@ -200,6 +202,7 @@
200202
plot_synth_weights,
201203
)
202204
from diff_diff.practitioner import practitioner_next_steps
205+
from diff_diff._guides_api import get_llm_guide
203206
from diff_diff.datasets import (
204207
clear_cache,
205208
list_datasets,
@@ -402,4 +405,6 @@
402405
"clear_cache",
403406
# Practitioner guidance
404407
"practitioner_next_steps",
408+
# LLM guide accessor
409+
"get_llm_guide",
405410
]

diff_diff/_guides_api.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
"""Runtime accessor for bundled LLM guide files."""
2+
from __future__ import annotations
3+
4+
from importlib.resources import files
5+
6+
_VARIANT_TO_FILE = {
7+
"concise": "llms.txt",
8+
"full": "llms-full.txt",
9+
"practitioner": "llms-practitioner.txt",
10+
}
11+
12+
13+
def get_llm_guide(variant: str = "concise") -> str:
14+
"""Return the contents of a bundled LLM guide.
15+
16+
Parameters
17+
----------
18+
variant : str, default "concise"
19+
Which guide to load. Names are case-sensitive. One of:
20+
21+
- ``"concise"`` -- compact API reference (llms.txt)
22+
- ``"full"`` -- complete API documentation (llms-full.txt)
23+
- ``"practitioner"`` -- 8-step practitioner workflow (llms-practitioner.txt)
24+
25+
Returns
26+
-------
27+
str
28+
The full text of the requested guide.
29+
30+
Raises
31+
------
32+
ValueError
33+
If ``variant`` is not one of the known guide names.
34+
35+
Examples
36+
--------
37+
>>> from diff_diff import get_llm_guide
38+
>>> concise = get_llm_guide()
39+
>>> workflow = get_llm_guide("practitioner")
40+
"""
41+
try:
42+
filename = _VARIANT_TO_FILE[variant]
43+
except (KeyError, TypeError):
44+
valid = ", ".join(repr(k) for k in _VARIANT_TO_FILE)
45+
raise ValueError(
46+
f"Unknown guide variant {variant!r}. Valid options: {valid}."
47+
) from None
48+
return files("diff_diff.guides").joinpath(filename).read_text(encoding="utf-8")

diff_diff/guides/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""LLM guide files bundled with diff-diff."""
Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")
3333

3434
## Practitioner Workflow (based on Baker et al. 2025)
3535

36-
For rigorous DiD analysis, follow the 8-step framework in docs/llms-practitioner.txt.
36+
For rigorous DiD analysis, follow the 8-step framework (call `diff_diff.get_llm_guide("practitioner")`).
3737
After estimation, call:
3838

3939
```python
@@ -1029,6 +1029,12 @@ Returned by `SyntheticDiD.fit()`.
10291029

10301030
**Methods:** `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`, `get_unit_weights_df()`, `get_time_weights_df()`
10311031

1032+
**Validation diagnostics** (call after `fit()`):
1033+
- `get_weight_concentration(top_k=5)` - effective N and top-k weight share; flags fragile synthetic controls dominated by a few donor units
1034+
- `get_loo_effects_df()` - per-unit leave-one-out influence from the jackknife pass (DataFrame includes both control and treated rows). Requires `variance_method="jackknife"`; raises `ValueError` if LOO is unavailable (see the method docstring for the full set of conditions, e.g. single treated unit or only one control with nonzero effective weight)
1035+
- `in_time_placebo()` - re-estimate on shifted fake treatment dates in the pre-period; near-zero placebo ATTs indicate a credible design
1036+
- `sensitivity_to_zeta_omega()` - re-estimate across a grid of unit-weight regularization values; checks ATT robustness to the auto-selected zeta_omega
1037+
10321038
### TripleDifferenceResults
10331039

10341040
Returned by `TripleDifference.fit()`.
Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,21 @@
33
> An 8-step workflow for rigorous Difference-in-Differences analysis, based on
44
> Baker et al. (2025) "Difference-in-Differences Designs: A Practitioner's
55
> Guide" and adapted for the diff-diff library. Some steps are reorganized or
6-
> extended relative to the paper (see docs/methodology/REGISTRY.md for details).
6+
> extended relative to the paper:
7+
>
8+
> - **Numbering**: diff-diff uses 1-Define, 2-Assumptions, 3-Test PT,
9+
> 4-Choose estimator, 5-Estimate, 6-Sensitivity, 7-Heterogeneity,
10+
> 8-Robustness. The paper uses 1-Define, 2-Assumptions, 3-Estimation method,
11+
> 4-Uncertainty, 5-Estimate, 6-Sensitivity, 7-Heterogeneity, 8-Keep learning.
12+
> - **Parallel trends testing** is a separate Step 3 (the paper embeds it in
13+
> Step 2), to ensure AI agents execute it as a distinct action.
14+
> - **Sources of uncertainty** (paper's Step 4) are folded into Step 5
15+
> (Estimate) with an explicit cluster-count check directive: >= 50 clusters
16+
> for asymptotic SEs, otherwise wild bootstrap. The 50-cluster threshold is
17+
> a diff-diff convention.
18+
> - **Step 8** is "Robustness & Reporting" (compare estimators, report with
19+
> and without covariates). The paper's Step 8 is "Keep learning." The
20+
> mandatory with/without covariate comparison is a diff-diff convention.
721

822
## Instructions for AI Agents
923

@@ -282,7 +296,8 @@ to your estimator's API. Examples:
282296
- **StackedDiD**: vary `clean_control` definition
283297
- **EfficientDiD**: compare `control_group='never_treated'` vs `'last_cohort'`
284298
- **ImputationDiD/TwoStageDiD**: leave-one-cohort-out, cross-estimator comparison
285-
- **SyntheticDiD/TROP**: in-time or in-space placebo (fake treatment date, leave-one-unit-out)
299+
- **SyntheticDiD**: built-in diagnostics on the results object - `results.in_time_placebo()`, `results.get_loo_effects_df()` (requires `variance_method="jackknife"` at fit time), `results.sensitivity_to_zeta_omega()`, and `results.get_weight_concentration()`
300+
- **TROP**: in-time or in-space placebo (fake treatment date, leave-one-unit-out)
286301

287302
```python
288303
from diff_diff import run_all_placebo_tests

docs/llms.txt renamed to diff_diff/guides/llms.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,13 @@ diagnostic steps produces unreliable results.
2727
After estimation, call `practitioner_next_steps(results)` for context-aware
2828
guidance on remaining steps.
2929

30-
Full practitioner guide: docs/llms-practitioner.txt
30+
Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
3131

3232
## Documentation
3333

3434
### Getting Started
3535

36-
- [Practitioner Guide](docs/llms-practitioner.txt): 8-step workflow for rigorous DiD analysis (Baker et al. 2025) — **start here**
36+
- **Practitioner Guide** (call `diff_diff.get_llm_guide("practitioner")`): 8-step workflow for rigorous DiD analysis (Baker et al. 2025) — **start here**
3737
- [Quickstart](https://diff-diff.readthedocs.io/en/stable/quickstart.html): Installation, basic 2x2 DiD — column-name and formula interfaces, covariates, fixed effects, cluster-robust SEs
3838
- [Choosing an Estimator](https://diff-diff.readthedocs.io/en/stable/choosing_estimator.html): Decision flowchart for selecting the right estimator for your research design
3939
- [Troubleshooting](https://diff-diff.readthedocs.io/en/stable/troubleshooting.html): Common issues and solutions

docs/conf.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
]
3434

3535
templates_path = ["_templates"]
36-
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "llms.txt", "llms-full.txt"]
36+
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
3737

3838
# -- Options for autodoc -----------------------------------------------------
3939
autodoc_default_options = {
@@ -71,7 +71,11 @@
7171
"https://diff-diff.readthedocs.io/en/stable/",
7272
)
7373
html_baseurl = _canonical_url
74-
html_extra_path = ["llms.txt", "llms-full.txt"]
74+
html_extra_path = [
75+
"../diff_diff/guides/llms.txt",
76+
"../diff_diff/guides/llms-full.txt",
77+
"../diff_diff/guides/llms-practitioner.txt",
78+
]
7579
sitemap_url_scheme = "{link}"
7680

7781
html_theme_options = {

0 commit comments

Comments
 (0)