Address PR #363 R8 review (1 P3)

igerber · claude · igerber · commit 1b41a547756d · 2026-04-24T19:16:27.000-04:00
R8 P3 (doc-consistency drift): prose in _fit_event_study docstring,
REGISTRY sup-t bullet, and CHANGELOG entry still described the
pre-R7-fix weighted-path behavior (Binder-TSL for all weighted fits;
raw unit-level shortcut bootstrap). Updated all three to match the
actual implementation:

- _fit_event_study docstring: distinguishes survey= path (Binder-TSL)
  from weights= shortcut (analytical CCT-2014 / 2SLS pweight
  sandwich), documents trivial-survey bootstrap routing for cband.
- REGISTRY.md: new "weights= shortcut ↔ bootstrap routing" bullet
  under the sup-t section explaining the synthetic trivial
  ResolvedSurveyDesign construction and why (centered +
  sqrt(n/(n-1))-corrected branch targets V_HC1, raw unit-level would
  give ((n-1)/n) · V_HC1 under-scaling).
- CHANGELOG.md: per-horizon variance on weights= shortcut clarified
  as analytical (not Binder-TSL); sup-t routing through trivial
  resolved documented.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
-- **`HeterogeneousAdoptionDiD` mass-point `survey=` / `weights=` + event-study `aggregate="event_study"` survey composition + multiplier-bootstrap sup-t simultaneous confidence band (Phase 4.5 B).** Closes the two Phase 4.5 A `NotImplementedError` gates: `design="mass_point" + weights/survey` and `aggregate="event_study" + weights/survey`. Weighted 2SLS sandwich in `_fit_mass_point_2sls` follows the Wooldridge 2010 Ch. 12 pweight convention (`w²` in the HC1 meat, `w·u` in the CR1 cluster score, weighted bread `Z'WX`); HC1 and CR1 ("stata" `se_type`) bit-parity with `estimatr::iv_robust(..., weights=, clusters=)` at `atol=1e-10` (new cross-language golden at `benchmarks/data/estimatr_iv_robust_golden.json`, generated by `benchmarks/R/generate_estimatr_iv_robust_golden.R`; `estimatr` added to `benchmarks/R/requirements.R`). `_fit_mass_point_2sls` gains `weights=` + `return_influence=` kwargs and now always returns a 3-tuple `(beta, se, psi)` — `psi` is the per-unit IF on the β̂-scale scaled so `compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1[1,1]` at `atol=1e-10` (PR #359 IF scale convention applied uniformly; no `sum(psi²)` claims). Event-study path threads `weights_unit_full` / `resolved_survey_unit_full` through the per-horizon loop, composing Binder-TSL variance per horizon via `compute_survey_if_variance` (continuous + mass-point) and populating `survey_metadata` / `variance_formula` / `effective_dose_mean` (previously hardcoded `None` at `had.py:3366`). New multiplier-bootstrap sup-t: `_sup_t_multiplier_bootstrap` reuses `diff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch` (PSU-level draws with stratum centering, FPC scaling, lonely-PSU handling) / `generate_bootstrap_weights_batch` (unit-level on the `weights=` shortcut), composes `delta = weights @ IF` with NO `(1/n)` prefactor (matching `staggered_bootstrap.py:373` idiom), normalizes by per-horizon analytical SE, and takes the `(1-alpha)`-quantile of the sup-t distribution. At H=1 the quantile reduces to `Φ⁻¹(1 − alpha/2) ≈ 1.96` up to MC noise (regression-locked by `TestSupTReducesToNormalAtH1`). `HeterogeneousAdoptionDiD.__init__` gains `n_bootstrap: int = 999` and `seed: Optional[int] = None` (CS-parity singular seed); `fit()` gains `cband: bool = True` (only consulted on weighted event-study). `HeterogeneousAdoptionDiDEventStudyResults` extended with `variance_formula`, `effective_dose_mean`, `cband_low`, `cband_high`, `cband_crit_value`, `cband_method`, `cband_n_bootstrap` (all `None` on unweighted fits); surfaced in `to_dict`, `to_dataframe`, `summary`, `__repr__`. Unweighted event-study with `cband=False` preserves pre-Phase 4.5 B numerical output bit-exactly (stability invariant, locked by regression tests). Zero-weight subpopulation convention carries over from PR #359 (filter for design decisions; preserve full ResolvedSurveyDesign for variance). Non-pweight SurveyDesigns (`aweight`, `fweight`, replicate designs) raise `NotImplementedError` on both new paths (reciprocal-guard discipline). Pretest surfaces (`qug_test`, `stute_test`, `yatchew_hr_test`, joint variants, `did_had_pretest_workflow`) remain unweighted in this release — Phase 4.5 C / C0. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted 2SLS (Phase 4.5 B)", "Event-study survey composition", and "Sup-t multiplier bootstrap" for derivations and invariants.
+- **`HeterogeneousAdoptionDiD` mass-point `survey=` / `weights=` + event-study `aggregate="event_study"` survey composition + multiplier-bootstrap sup-t simultaneous confidence band (Phase 4.5 B).** Closes the two Phase 4.5 A `NotImplementedError` gates: `design="mass_point" + weights/survey` and `aggregate="event_study" + weights/survey`. Weighted 2SLS sandwich in `_fit_mass_point_2sls` follows the Wooldridge 2010 Ch. 12 pweight convention (`w²` in the HC1 meat, `w·u` in the CR1 cluster score, weighted bread `Z'WX`); HC1 and CR1 ("stata" `se_type`) bit-parity with `estimatr::iv_robust(..., weights=, clusters=)` at `atol=1e-10` (new cross-language golden at `benchmarks/data/estimatr_iv_robust_golden.json`, generated by `benchmarks/R/generate_estimatr_iv_robust_golden.R`; `estimatr` added to `benchmarks/R/requirements.R`). `_fit_mass_point_2sls` gains `weights=` + `return_influence=` kwargs and now always returns a 3-tuple `(beta, se, psi)` — `psi` is the per-unit IF on the β̂-scale scaled so `compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1[1,1]` at `atol=1e-10` (PR #359 IF scale convention applied uniformly; no `sum(psi²)` claims). Event-study per-horizon variance: `survey=` path composes Binder-TSL via `compute_survey_if_variance`; `weights=` shortcut uses the analytical weighted-robust SE (continuous: CCT-2014 `bc_fit.se_robust / |den|`; mass-point: weighted 2SLS pweight sandwich from `_fit_mass_point_2sls` — HC1 / classical / CR1). `survey_metadata` / `variance_formula` / `effective_dose_mean` populated in both regimes (previously hardcoded `None` at `had.py:3366`). New multiplier-bootstrap sup-t: `_sup_t_multiplier_bootstrap` reuses `diff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch` for PSU-level draws with stratum centering + sqrt(n_h/(n_h-1)) small-sample correction + FPC scaling + lonely-PSU handling. On the `weights=` shortcut, sup-t calibration is routed through a synthetic trivial `ResolvedSurveyDesign` so the centered + small-sample-corrected branch fires uniformly — targets the analytical HC1 variance family (`compute_survey_if_variance(IF, trivial) ≈ V_HC1` per the PR #359 IF scale invariant) rather than the raw `sum(ψ²) = ((n-1)/n) · V_HC1` that unit-level Rademacher multipliers would produce on the HC1-scaled IF. Perturbations: `delta = weights @ IF` with NO `(1/n)` prefactor (matching `staggered_bootstrap.py:373` idiom), normalized by per-horizon analytical SE, `(1-alpha)`-quantile of the sup-t distribution. At H=1 the quantile reduces to `Φ⁻¹(1 − alpha/2) ≈ 1.96` up to MC noise (regression-locked by `TestSupTReducesToNormalAtH1`). `HeterogeneousAdoptionDiD.__init__` gains `n_bootstrap: int = 999` and `seed: Optional[int] = None` (CS-parity singular seed); `fit()` gains `cband: bool = True` (only consulted on weighted event-study). `HeterogeneousAdoptionDiDEventStudyResults` extended with `variance_formula`, `effective_dose_mean`, `cband_low`, `cband_high`, `cband_crit_value`, `cband_method`, `cband_n_bootstrap` (all `None` on unweighted fits); surfaced in `to_dict`, `to_dataframe`, `summary`, `__repr__`. Unweighted event-study with `cband=False` preserves pre-Phase 4.5 B numerical output bit-exactly (stability invariant, locked by regression tests). Zero-weight subpopulation convention carries over from PR #359 (filter for design decisions; preserve full ResolvedSurveyDesign for variance). Non-pweight SurveyDesigns (`aweight`, `fweight`, replicate designs) raise `NotImplementedError` on both new paths (reciprocal-guard discipline). Pretest surfaces (`qug_test`, `stute_test`, `yatchew_hr_test`, joint variants, `did_had_pretest_workflow`) remain unweighted in this release — Phase 4.5 C / C0. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted 2SLS (Phase 4.5 B)", "Event-study survey composition", and "Sup-t multiplier bootstrap" for derivations and invariants.
 - **`HeterogeneousAdoptionDiD.fit(survey=..., weights=...)` on continuous-dose paths (Phase 4.5 survey support).** The `continuous_at_zero` (paper Design 1') and `continuous_near_d_lower` (Design 1 continuous-near-d̲) designs accept survey weights through two interchangeable kwargs: `weights=<array>` (pweight shortcut, weighted-robust SE from the CCT-2014 lprobust port) and `survey=SurveyDesign(weights, strata, psu, fpc)` (design-based inference via Binder-TSL variance using the existing `compute_survey_if_variance` helper at `diff_diff/survey.py:1802`). Point estimates match across both entry paths; SE diverges by design (pweight-only vs PSU-aggregated). `HeterogeneousAdoptionDiDResults.survey_metadata` is a repo-standard `SurveyMetadata` dataclass (weight_type / effective_n / design_effect / sum_weights / weight_range / n_strata / n_psu / df_survey); HAD-specific extras (`variance_formula` label, `effective_dose_mean`) are separate top-level result fields. `to_dict()` surfaces the full `SurveyMetadata` object plus `variance_formula` + `effective_dose_mean`; `summary()` renders `variance_formula`, `effective_n`, `effective_dose_mean`, and (when the survey= path is used) `df_survey`; `__repr__` surfaces `variance_formula` + `effective_dose_mean` when present. The HAD `mass_point` design and `aggregate="event_study"` path raise `NotImplementedError` under survey/weights (deferred to Phase 4.5 B: weighted 2SLS + event-study survey composition); the HAD pretests stay unweighted in this release (Phase 4.5 C). Parity ceiling acknowledged — no public weighted-CCF bias-corrected local-linear reference exists in any language; methodology confidence comes from (1) uniform-weights bit-parity at `atol=1e-14` on the full lprobust output struct, (2) cross-language weighted-OLS parity (manual R reference) at `atol=1e-12`, and (3) Monte Carlo oracle consistency on known-τ DGPs. `_nprobust_port.lprobust` gains `weights=` and `return_influence=` (used internally by the Binder-TSL path); `bias_corrected_local_linear` removes the Phase 1c `NotImplementedError` on `weights=` and forwards. Auto-bandwidth selection remains unweighted in this release — pass `h`/`b` explicitly for weight-aware bandwidths. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted extension (Phase 4.5 survey support)".
 - **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).
 - **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`.
diff --git a/diff_diff/had.py b/diff_diff/had.py
@@ -3725,12 +3725,30 @@ def _fit_event_study(
         on the period-F dose distribution, and then fits the chosen design
         path independently on each event-time horizon's first differences.
 
-        On the weighted path (``survey=`` or ``weights=``), per-horizon
-        variance is Binder-TSL via :func:`compute_survey_if_variance` and
-        the simultaneous confidence band (when ``cband=True``) is
-        constructed by a shared-PSU multiplier bootstrap over the stacked
-        per-horizon influence-function matrix (see
-        :func:`_sup_t_multiplier_bootstrap`).
+        Per-horizon variance regimes (matches the static-path contract):
+
+        - **``survey=``**: Binder (1983) Taylor-series linearization
+          via :func:`compute_survey_if_variance` on the per-unit
+          β̂-scale influence function (continuous + mass-point both
+          route through the same helper). Inference is
+          t-distribution with ``df_survey``.
+        - **``weights=`` shortcut**: analytical SE — CCT-2014
+          weighted-robust for continuous paths (``bc_fit.se_robust /
+          |den|``) and weighted 2SLS pweight sandwich for mass-point
+          (``_fit_mass_point_2sls`` HC1 / classical / CR1). Inference
+          is Normal (``df=None``).
+
+        The simultaneous confidence band on the weighted path (when
+        ``cband=True``) is constructed by a shared-PSU multiplier
+        bootstrap over the stacked per-horizon β̂-scale IF matrix via
+        :func:`_sup_t_multiplier_bootstrap`. On the ``weights=``
+        shortcut, sup-t calibration is routed through a synthetic
+        trivial ``ResolvedSurveyDesign`` so the centered +
+        sqrt(n/(n-1))-corrected survey-aware branch fires uniformly —
+        matches the analytical HC1 variance family at the
+        compute_survey_if_variance(IF, trivial) ≈ V_HC1 invariant.
+        Unweighted event-study skips the bootstrap (pre-Phase 4.5 B
+        numerical output preserved).
         """
         # ---- Resolve effective fit-time state (local vars only,
         # feedback_fit_does_not_mutate_config). ----
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -2314,6 +2314,8 @@ Under `survey=SurveyDesign(weights, strata, psu, fpc)`, the variance composes vi
 
 **Scope**: sup-t bootstrap runs only when `aggregate="event_study"` AND `weights=` or `survey=` is supplied AND `cband=True` (default). Unweighted event-study skips the bootstrap entirely — pre-Phase 4.5 B numerical output bit-exactly preserved. Setting `cband=False` on the weighted path disables the bootstrap (useful for smoke-test bit-parity assertions against the unweighted path at uniform weights).
 
+**`weights=` shortcut ↔ bootstrap routing**: on the `weights=` shortcut (no user-supplied `SurveyDesign`), per-horizon SE stays analytical (CCT-2014 robust for continuous, HC1/classical/CR1 sandwich for mass-point — NOT Binder-TSL), but sup-t calibration is routed through a synthetic trivial `ResolvedSurveyDesign` (pweight, no strata / PSU / FPC, `lonely_psu="remove"`) so the centered + sqrt(n/(n-1))-corrected bootstrap branch fires uniformly. This matches the analytical HC1 variance family — the `compute_survey_if_variance(IF, trivial) ≈ V_HC1` invariant from the IF scale convention — so the bootstrap variance target agrees with the per-horizon SE normalizer. Without this routing, the unit-level bootstrap branch would normalize against raw `sum(ψ²) = ((n-1)/n) · V_HC1` on the HC1-scaled IF and produce silently too-narrow simultaneous bands. Regression-locked by the `weights=`-shortcut / `survey=`-trivial equivalence test in `TestEventStudySurveyCband`.
+
 - **Deviation from shared survey-bootstrap contract:** `_sup_t_multiplier_bootstrap` raises `NotImplementedError` on `SurveyDesign(lonely_psu="adjust")` with singleton strata. The shared `generate_survey_multiplier_weights_batch` helper pools singleton PSUs into a pseudo-stratum with NONZERO multipliers, but `compute_survey_if_variance` centers singleton PSU scores at the GLOBAL mean of PSU scores (rather than the pseudo-stratum mean). Matching the two would require a pooled-singleton pseudo-stratum centering transform in the HAD sup-t path that has not been derived. The HAD-specific limitation is scoped to: weighted event-study + `cband=True` + `lonely_psu="adjust"` + at least one singleton stratum. Practitioners can use `lonely_psu="remove"` or `"certainty"` (matches the analytical target bit-exactly on the HAD sup-t path), or pass `cband=False` to skip the simultaneous band. All other survey-bootstrap consumers (CallawaySantAnna, dCDH, SDID) retain full `lonely_psu="adjust"` support through the shared helper.
 
 - **Deviation: weighted mass-point `vcov_type="classical"` on survey/sup-t paths:** `vcov_type="classical"` raises `NotImplementedError` whenever the mass-point IF matrix is consumed downstream — specifically on `design="mass_point"` + `survey=` (static + event-study) and `design="mass_point"` + `weights=` + `aggregate="event_study"` + `cband=True`. The per-unit 2SLS IF returned by `_fit_mass_point_2sls` is scaled (`sqrt((n-1)/(n-k))`) to match V_HC1 via `compute_survey_if_variance`; mixing it with a classical analytical SE would silently return a V_HC1-targeted variance under a classical label. A classical-aligned IF derivation is queued for a follow-up PR. The allowed weighted-mass-point combinations are: `vcov_type="hc1"` on every path; `vcov_type="classical"` on `weights=` + `aggregate="overall"`, and `weights=` + `aggregate="event_study"` + `cband=False` (no IF consumption).