Skip to content

Commit 8ecbaf7

Browse files
igerberclaude
andcommitted
Fix latent doc-snippet bugs from PR igerber#389 (HAD ecosystem)
PR igerber#389 added HAD code snippets to choosing_estimator.rst, troubleshooting.rst, and r_comparison.rst. None of those edits triggered rust-test.yml (which only runs on rust/, diff_diff/, tests/, pyproject.toml, and the workflow file), so tests/test_doc_snippets.py never executed and the snippets shipped with five latent bugs that now surface on every code PR via the Pure Python Fallback job. Bugs addressed: - r_comparison:block6 — bare HAD.fit(data, ...) with the generate_staggered_data fixture failed because the default aggregate='overall' requires exactly 2 periods and the namespace data has 10. Replaced with an inline HAD-shape panel construction (mirrors the upstream choosing_estimator:block7 fix in 55d7a27) plus aggregate='event_study'. - troubleshooting:block20 — the snippet demonstrates first_treat_col= auto-filtering on a staggered panel. The fixture's first_treat values disagree with the dose path (random per-row dose on never-treated units), tripping HAD's first_treat / dose-path consistency validator. Inlined a 120-unit / 10-period staggered HAD-shape panel (30 never + 30 cohort 5 + 60 cohort 8) so the validator passes and the boundary local-linear estimator has enough distinct dose values to fit. - troubleshooting:block17 / block18 / r_comparison:block7 — these are legitimately context-dependent snippets that reference est / results from prior text-flow context (inspection / output-format examples). Added them to _CONTEXT_DEPENDENT_SNIPPETS so the expected NameError is suppressed, matching the pattern already used for block8, the api_bacon blocks, and the existing r_comparison context-dependent set. choosing_estimator:block7 was the sixth failing snippet but was already fixed upstream in 55d7a27 with the inline-construction pattern; this branch rebases onto that. Verification: PYTHONPATH=. DIFF_DIFF_BACKEND=python pytest tests/test_doc_snippets.py reports 111 passed, 4 skipped, 0 failed on this branch (was 6 failed on origin/main before 55d7a27 and 5 failed after). Follow-up (separate PR queued): carve test_doc_snippets.py out into a dedicated docs-tests.yml workflow triggered on docs/** + diff_diff/** + the test file itself, and exclude it from rust-test.yml's pytest invocations so doc bugs are caught on doc PRs (not silently inherited by code PRs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b560c80 commit 8ecbaf7

3 files changed

Lines changed: 46 additions & 2 deletions

File tree

docs/r_comparison.rst

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -237,11 +237,28 @@ identification assumptions (the design path is auto-detected separately by
237237

238238
.. code-block:: python
239239
240+
import numpy as np
241+
import pandas as pd
240242
from diff_diff import HeterogeneousAdoptionDiD
241243
244+
# Build a HAD-shape panel: D=0 in pre-periods (t < F), D > 0 only at F+.
245+
rng = np.random.default_rng(42)
246+
G, F, T = 200, 4, 5
247+
doses = rng.beta(0.5, 1.0, size=G)
248+
rows = []
249+
for g in range(G):
250+
for t in range(1, T + 1):
251+
y = (rng.normal()
252+
+ (doses[g] + doses[g] ** 2) * (t >= F)
253+
+ rng.normal(0, 0.5))
254+
d = doses[g] if t >= F else 0.0
255+
rows.append({'unit': g, 'period': t, 'y': y, 'dose': d})
256+
had_data = pd.DataFrame(rows)
257+
242258
est = HeterogeneousAdoptionDiD()
243-
results = est.fit(data, outcome_col='y', unit_col='unit',
244-
time_col='period', dose_col='dose')
259+
results = est.fit(had_data, outcome_col='y', unit_col='unit',
260+
time_col='period', dose_col='dose',
261+
aggregate='event_study')
245262
246263
Key Differences
247264
---------------

docs/troubleshooting.rst

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -593,6 +593,30 @@ a ``UserWarning``). The fit raises only when the panel is staggered
593593

594594
.. code-block:: python
595595
596+
import numpy as np
597+
import pandas as pd
598+
599+
# Build a staggered HAD panel for this example: 120 units, three
600+
# cohorts (30 never-treated + 30 treated at period 5 + 60 treated at
601+
# period 8). Dose is zero pre-treatment per unit and a constant
602+
# positive value post-treatment, so the first_treat / dose-path
603+
# consistency validator passes. The 60-unit last cohort gives the
604+
# boundary local-linear estimator enough distinct dose values to fit.
605+
np.random.seed(42)
606+
n_units, n_periods = 120, 10
607+
first_treat_per_unit = np.array([0] * 30 + [5] * 30 + [8] * 60)
608+
dose_per_unit = np.where(
609+
first_treat_per_unit > 0, np.random.uniform(0.5, 2.0, n_units), 0.0
610+
)
611+
rows = []
612+
for u in range(n_units):
613+
ft = first_treat_per_unit[u]
614+
for t in range(n_periods):
615+
d_ut = dose_per_unit[u] if (ft > 0 and t >= ft) else 0.0
616+
y_ut = (d_ut > 0) * dose_per_unit[u] * 0.5 + np.random.normal()
617+
rows.append((u, t, d_ut, ft, y_ut))
618+
data = pd.DataFrame(rows, columns=["unit", "period", "dose", "first_treat", "y"])
619+
596620
# Primary remedy: pass `first_treat_col` so the estimator auto-filters
597621
# to the last-treatment cohort + never-treated and emits a UserWarning.
598622
est = HeterogeneousAdoptionDiD()

tests/test_doc_snippets.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -366,7 +366,10 @@ def _restore_datasets_module():
366366
"r_comparison:block3",
367367
"r_comparison:block4",
368368
"r_comparison:block6",
369+
"r_comparison:block7",
369370
"troubleshooting:block8",
371+
"troubleshooting:block17",
372+
"troubleshooting:block18",
370373
}
371374

372375

0 commit comments

Comments
 (0)