Address PR #346 CI review round 6: P1 panel-only note + P3 theorem refs

igerber · claude · igerber · commit afa0f6c68586 · 2026-04-20T19:54:29.000-04:00
**P1 (Methodology): Phase 2a panel-only not documented as deviation**

Paper Section 2 defines HAD on panel OR repeated cross-section data,
but Phase 2a ships a panel-only implementation. The existing
balanced-panel validator rejects RCS inputs (disjoint unit IDs
between periods) with a generic "unit(s) do not appear in both
periods" error, but the scope restriction was not documented in
REGISTRY.md as a `**Note:**` deviation from the paper's documented
surface.

Fix:
- REGISTRY.md: new `**Note (Phase 2a panel-only):**` block under the
  HAD Phase 2a entry, explaining the restriction and pointing at the
  follow-up RCS PR.
- `docs/methodology/papers/dechaisemartin-2026-review.md`: mirrored
  implementation note on the panel-or-RCS scope line.
- `HeterogeneousAdoptionDiD.fit()` docstring: new preamble paragraph
  stating the panel-only restriction and why (unit-level first
  differences required).
- `TODO.md`: new row queuing RCS identification-path work.
- `tests/test_had.py`: two regression tests
  (`test_repeated_cross_section_raises` and
  `test_repeated_cross_section_fit_raises`) that construct RCS-shaped
  input and assert the validator + fit() raise with the expected
  error.

**P3 (Docs/Tests): theorem/equation references were off**

Module docstring, result-class `att` docstring, and
`_fit_continuous()` docstring cited the paper's theorems/equations
inconsistently with the registry's source map:
- Design 1' identification: said "Theorem 3" -&gt; should be Theorem 1
  with Equation 3; Equation 7 is the sample estimator.
- Design 1 continuous-near-d_lower: said "Proposition 3 / Theorem 4"
  -&gt; should be Theorem 3 / Equation 11 (Theorem 4 is the QUG null
  test, not this estimand).

Fix: updated all three docstring sites to cite Theorem 1 / Equation 3
(Design 1' identification) + Equation 7 (sample estimator), and
Theorem 3 / Equation 11 (Design 1 continuous-near-d_lower / WAS_d_lower
under Assumption 6).

Targeted regression: 141 HAD tests + 520 total across Phase 1 and
adjacent surfaces, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/TODO.md b/TODO.md
@@ -100,6 +100,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | `HeterogeneousAdoptionDiD` Phase 4: Pierce-Schott (2016) replication harness; reproduce paper Figure 2 values and Table 1 coverage rates. | `benchmarks/`, `tests/` | Phase 2a | Low |
 | `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low |
 | `HeterogeneousAdoptionDiD` staggered-timing reduction: Phase 2a requires exactly 2 time periods and raises on `>2` periods with or without `first_treat_col`. A "last-cohort subgroup" reduction scheme (slice to max-cohort's 2-period window) could lift this in a targeted follow-up PR before full Phase 2b multi-period aggregation. | `diff_diff/had.py::_validate_had_panel` | Phase 2a | Low |
+| `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium |
 
 #### Performance
 
diff --git a/diff_diff/had.py b/diff_diff/had.py
@@ -9,7 +9,8 @@
 :func:`diff_diff.utils.safe_inference`.
 
 1. Design 1' (``continuous_at_zero``): ``d_lower = 0``, boundary density
-   continuous at zero, Assumption 3. Equation 7 / Theorem 3:
+   continuous at zero, Assumption 3. Theorem 1 / Equation 3
+   (identification); Equation 7 (sample estimator):
 
        beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2]
 
@@ -19,7 +20,8 @@
 
 2. Design 1 continuous-near-d_lower (``continuous_near_d_lower``):
    ``d_lower > 0``, continuous boundary density, Assumption 5 or 6.
-   Proposition 3 / Theorem 4 (``WAS_{d_lower}`` under Assumption 6):
+   Theorem 3 / Equation 11 (``WAS_{d_lower}`` under Assumption 6;
+   Theorem 4 is the QUG null test, not this estimand):
 
        beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d])
               / E[D_2 - d_lower]
@@ -135,13 +137,14 @@ class HeterogeneousAdoptionDiDResults:
     att : float
         Point estimate of the WAS parameter on the beta-scale.
 
-        - Design 1' (paper Equation 7 / Theorem 3):
+        - Design 1' (paper Theorem 1 / Equation 3 identification;
+          Equation 7 sample estimator):
           ``att = (mean(ΔY) - tau_bc) / D_bar``
           where ``tau_bc`` is the bias-corrected local-linear estimate
           of ``lim_{d v 0} E[ΔY | D_2 <= d]`` and
           ``D_bar = (1/G) * sum(D_{g,2})``.
-        - Design 1 continuous-near-d_lower (paper Theorem 4,
-          ``WAS_{d_lower}`` under Assumption 6):
+        - Design 1 continuous-near-d_lower (paper Theorem 3 /
+          Equation 11, ``WAS_{d_lower}`` under Assumption 6):
           ``att = (mean(ΔY) - tau_bc) / mean(D_2 - d_lower)``
           where ``tau_bc`` is the bias-corrected local-linear estimate
           of ``lim_{d v d_lower} E[ΔY | D_2 <= d]``.
@@ -1006,6 +1009,17 @@ def fit(
     ) -> HeterogeneousAdoptionDiDResults:
         """Fit the HAD estimator on a two-period panel.
 
+        Phase 2a is **panel-only**: the paper (Section 2) defines HAD on
+        panel or repeated-cross-section data, but this implementation
+        requires a balanced two-period panel with a unit identifier so
+        that unit-level first differences ``ΔY_g = Y_{g,2} - Y_{g,1}``
+        can be formed. Repeated-cross-section inputs (disjoint unit IDs
+        between periods) are rejected by the balanced-panel validator.
+        Repeated-cross-section support is queued for a follow-up PR
+        (tracked in ``TODO.md``); it requires a separate identification
+        path based on pre/post cell means rather than unit-level
+        differences.
+
         Parameters
         ----------
         data : pd.DataFrame
@@ -1337,16 +1351,17 @@ def _fit_continuous(
         Implements de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau
         (2026) continuous-design estimators:
 
-        - Design 1' (``continuous_at_zero``), paper Equation 7 /
-          Theorem 3:
+        - Design 1' (``continuous_at_zero``), paper Theorem 1 /
+          Equation 3 (identification); Equation 7 (sample estimator):
 
               beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2]
 
           Regressor passed to the local-linear boundary fit is
           ``d_arr``; the boundary is ``0``.
 
-        - Design 1 (``continuous_near_d_lower``), paper Proposition 3 /
-          Theorem 4 (``WAS_{d_lower}`` under Assumption 6):
+        - Design 1 (``continuous_near_d_lower``), paper Theorem 3 /
+          Equation 11 (``WAS_{d_lower}`` under Assumption 6; note
+          Theorem 4 is the QUG null test, not this estimand):
 
               beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d])
                      / E[D_2 - d_lower]
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -2324,6 +2324,7 @@ Shipped as `did_had_pretest_workflow()` and surfaced via `practitioner_next_step
     - **Note (Design 1 identification):** `continuous_near_d_lower` and `mass_point` fits emit a `UserWarning` surfacing that `WAS_{d̲}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. `continuous_at_zero` (Design 1', Assumption 3 only) does not emit this warning.
     - **Note (CI endpoints):** Because the continuous-path `att` is `(mean(ΔY) - τ̂_bc) / den`, the beta-scale CI endpoints reverse relative to the Phase 1c boundary-limit CI: `CI_lower(β̂) = (mean(ΔY) - CI_upper(τ̂_bc)) / den` and `CI_upper(β̂) = (mean(ΔY) - CI_lower(τ̂_bc)) / den`. The `HeterogeneousAdoptionDiD.fit()` implementation computes `att ± z · se` directly via `safe_inference`, which handles the reversal naturally from the transformed point estimate.
     - **Note (Phase 2a scope):** Ships single-period only. `aggregate="event_study"` (Appendix B.2 multi-period extension) raises `NotImplementedError` pointing to Phase 2b. `survey=` and `weights=` kwargs raise `NotImplementedError` pointing to the follow-up survey-integration PR.
+    - **Note (Phase 2a panel-only):** The paper (Section 2) defines HAD on *panel or repeated cross-section* data, but Phase 2a ships a panel-only implementation: `HeterogeneousAdoptionDiD.fit()` requires a balanced two-period panel with a unit identifier so that unit-level first differences ΔY_g = Y_{g,2} - Y_{g,1} can be formed. Repeated-cross-section inputs (disjoint unit IDs between periods) are rejected by the existing balanced-panel validator with the "unit(s) do not appear in both periods" error. Repeated-cross-section support is queued for a follow-up PR (tracked in `TODO.md`); it will need a separate identification path based on pre/post cell means rather than unit-level differences.
 - [ ] Phase 2b: Multi-period event-study extension (Appendix B.2). Will lift the `aggregate="event_study"` scaffolding in `HeterogeneousAdoptionDiD.fit()` and aggregate per-cohort first-differences into an event-study result.
 - [ ] Phase 3: `qug_test()` (`T = D_{2,(1)} / (D_{2,(2)} - D_{2,(1)})`, rejection `{T > 1/α - 1}`).
 - [ ] Phase 3: `stute_test()` Cramér-von Mises with Mammen wild bootstrap.
diff --git a/docs/methodology/papers/dechaisemartin-2026-review.md b/docs/methodology/papers/dechaisemartin-2026-review.md
@@ -20,7 +20,7 @@
 **Key implementation requirements:**
 
 *Assumption checks / warnings:*
-- Data must be panel (or repeated cross-section) with `D_{g,1} = 0` for all `g` (nobody treated in period one).
+- Data must be panel (or repeated cross-section) with `D_{g,1} = 0` for all `g` (nobody treated in period one). **Phase 2a implementation note:** `diff_diff.HeterogeneousAdoptionDiD` ships panel-only; repeated-cross-section inputs are rejected by the balanced-panel validator. RCS support is queued for a follow-up PR.
 - Treatment dose `D_{g,2} >= 0`. For Design 1' (the QUG case) the support infimum `d̲ := inf Supp(D_{g,2})` must equal 0; for Design 1 (no QUG) `d̲ > 0` and Assumption 5 or 6 must be invoked.
 - Assumption 1 (i.i.d. sample): `(Y_{g,1}, Y_{g,2}, D_{g,1}, D_{g,2})_{g=1,...,G}` i.i.d.
 - Assumption 2 (parallel trends for the least-treated): `lim_{d ↓ d̲} E[ΔY(0) | D_2 ≤ d] = E[ΔY(0)]`. Testable with pre-trends when a pre-treatment period `t=0` exists. Reduces to standard parallel trends when treatment is binary.
diff --git a/tests/test_had.py b/tests/test_had.py
@@ -1546,6 +1546,58 @@ def test_both_all_zero_periods_raises(self):
         with pytest.raises(ValueError, match="variation"):
             _validate_had_panel(panel, "outcome", "dose", "period", "unit", None)
 
+    def test_repeated_cross_section_raises(self):
+        """Review P1 round 6: Phase 2a is panel-only. An RCS input (disjoint
+        unit IDs across periods) must be rejected by the balanced-panel
+        validator with the "unit(s) do not appear in both periods" error.
+        """
+        rng = np.random.default_rng(0)
+        G = 100
+        pre = pd.DataFrame(
+            {
+                "unit": np.arange(G),
+                "period": 1,
+                "dose": np.zeros(G),
+                "outcome": rng.standard_normal(G),
+            }
+        )
+        post = pd.DataFrame(
+            {
+                "unit": np.arange(G, 2 * G),
+                "period": 2,
+                "dose": rng.uniform(0, 1, G),
+                "outcome": rng.standard_normal(G),
+            }
+        )
+        rcs = pd.concat([pre, post], ignore_index=True)
+        with pytest.raises(ValueError, match=r"both periods|[Uu]nbalanced"):
+            _validate_had_panel(rcs, "outcome", "dose", "period", "unit", None)
+
+    def test_repeated_cross_section_fit_raises(self):
+        """End-to-end: fit() on an RCS panel raises ValueError."""
+        rng = np.random.default_rng(0)
+        G = 100
+        pre = pd.DataFrame(
+            {
+                "unit": np.arange(G),
+                "period": 1,
+                "dose": np.zeros(G),
+                "outcome": rng.standard_normal(G),
+            }
+        )
+        post = pd.DataFrame(
+            {
+                "unit": np.arange(G, 2 * G),
+                "period": 2,
+                "dose": rng.uniform(0, 1, G),
+                "outcome": rng.standard_normal(G),
+            }
+        )
+        rcs = pd.concat([pre, post], ignore_index=True)
+        est = HeterogeneousAdoptionDiD()
+        with pytest.raises(ValueError, match=r"both periods|[Uu]nbalanced"):
+            est.fit(rcs, "outcome", "dose", "period", "unit")
+
 
 # =============================================================================
 # Review P1: continuous_near_d_lower on a true mass-point sample rejects