Address PR #370 R8 review (1 P3)

igerber · claude · igerber · commit 194668274d1c · 2026-04-25T11:16:06.000-04:00
R8 P3 -- survey all_pass docstring/REGISTRY described the contract too
generically ("at least one linearity test conclusive"), which matches
the overall path but is looser than the implemented event-study path
(which requires BOTH pretrends_joint AND homogeneity_joint conclusive
+ non-rejecting).

Fix: split the survey all_pass description by aggregate in both
HADPretestReport docstring and REGISTRY note:
- overall: "at least one of Stute/Yatchew conclusive + no rejection"
  (mirrors paper Section 4 step-3 'Stute OR Yatchew' wording).
- event_study: "both joint variants conclusive + neither rejects"
  (same step-2 + step-3 closure as the unweighted aggregate, minus the
  QUG step).

Code unchanged; only documentation. 187 pretest tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/had_pretests.py b/diff_diff/had_pretests.py
@@ -614,10 +614,19 @@ class HADPretestReport:
         ``pretrends_joint is not None and
         np.isfinite(pretrends_joint.p_value)``,
         ``np.isfinite(homogeneity_joint.p_value)``, AND none of the
-        three rejects. On the **survey/weights path** (Phase 4.5 C): the
-        QUG-conclusiveness gate is dropped (qug=None per C0 deferral);
-        ``True`` iff at least one linearity test is conclusive AND no
-        conclusive test rejects (linearity-conditional admissibility).
+        three rejects. On the **survey/weights path** (Phase 4.5 C) the
+        QUG-conclusiveness gate is dropped (``qug=None`` per C0
+        deferral); the linearity-conditional rule splits by aggregate:
+
+        - ``aggregate="overall"`` survey: True iff at least one of
+          Stute/Yatchew is conclusive AND no conclusive test rejects.
+        - ``aggregate="event_study"`` survey: True iff
+          ``pretrends_joint`` is non-None and conclusive,
+          ``homogeneity_joint`` is conclusive, AND neither rejects.
+          (Both joint variants must be conclusive on the event-study
+          path - same step-2 + step-3 closure as the unweighted
+          aggregate, just without the QUG step.)
+
         Mirrors Phase 3's ``bool(np.isfinite(p_value))`` convention - no
         ``.conclusive()`` helper on any result dataclass.
     verdict : str
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -2439,7 +2439,9 @@ Tuning-parameter-free test of `H_0: d̲ = 0` versus `H_1: d̲ > 0`. Shipped in `
     - `sigma4_W = sum(w_avg * eps_g^2 * eps_{g-1}^2) / sum(w_avg)` reduces to `(1/(G-1)) * sum(prod)` at `w=1`.
     - `T_hr = sqrt(sum(w)) * (sigma2_lin - sigma2_diff) / sigma2_W` (effective-sample-size convention; reduces to `sqrt(G)` at `w=1`).
     Strictly positive weights required (the adjacent-difference variance is undefined under contiguous-zero blocks). PSU clustering is NOT propagated through the variance-ratio statistic (would require a survey-aware variance-of-variance estimator, out of scope). Pair-weight convention follows Krieger-Pfeffermann (1997, §3) for design-consistent inference on smooth functionals.
-  - **Workflow** (`did_had_pretest_workflow`) under `survey=` / `weights=`: skips the QUG step with a `UserWarning` (per Phase 4.5 C0 deferral), sets `qug=None` on the report, and dispatches the linearity family with the survey-aware mechanism. Verdict carries a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix. `all_pass` drops the QUG-conclusiveness condition: `True` iff at least one linearity test is conclusive AND no conclusive test rejects.
+  - **Workflow** (`did_had_pretest_workflow`) under `survey=` / `weights=`: skips the QUG step with a `UserWarning` (per Phase 4.5 C0 deferral), sets `qug=None` on the report, and dispatches the linearity family with the survey-aware mechanism. Verdict carries a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix. `all_pass` drops the QUG-conclusiveness condition; the linearity-conditional rule splits by aggregate:
+    - `aggregate="overall"`: `True` iff at least one of `stute`/`yatchew` is conclusive AND no conclusive test rejects (paper Section 4 step-3 "Stute OR Yatchew" wording carries through).
+    - `aggregate="event_study"`: `True` iff `pretrends_joint` is non-None and conclusive, `homogeneity_joint` is conclusive, AND neither rejects. Both joint variants must be conclusive on the event-study path (same step-2 + step-3 closure as the unweighted aggregate, just without the QUG step).
   - **Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) deferred** to a parallel follow-up. Each helper raises `NotImplementedError` on `survey.replicate_weights is not None` (defense in depth: workflow + every direct-helper entry rejects, mirroring the reciprocal-guard discipline from PR #346). The per-replicate weight-ratio rescaling for the OLS-on-residuals refit step is not covered by the multiplier-bootstrap composition above.
   - **`lonely_psu='adjust'` with singleton strata is rejected** with `NotImplementedError` on the Stute family (mirrors HAD sup-t bootstrap at `had.py:2081-2118`). The bootstrap multiplier helper pools singleton strata into a pseudo-stratum with nonzero multipliers, but the analytical variance target requires a pseudo-stratum centering transform that has not been derived for the Stute CvM. Use `lonely_psu='remove'` (drops singleton contributions) or `'certainty'` (zero-variance singletons); both produce all-zero singleton multipliers that match a well-defined analytical target. Variance-unidentified designs (`df_survey <= 0` after the adjust+singleton case is handled) return `NaN` with a `UserWarning` (single-PSU unstratified or one-PSU-per-stratum under remove/certainty).
   - **Constant-within-unit invariant**: per-row `weights=` / `survey=col` are aggregated to per-unit `(G,)` arrays via the existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (had.py:1604, :1671); these enforce constant-within-unit invariant on weights and on every survey design column (strata, psu, fpc) and raise on violation. Direct callers passing already-resolved `ResolvedSurveyDesign` (or per-unit `weights` array) bypass this aggregation; the invariant is the caller's responsibility on that path.