Skip to content

Commit efdcb2d

Browse files
igerberclaude
andcommitted
Consolidate HAD survey-design API to single survey_design= kwarg
Adds survey_design= as the canonical kwarg on all 8 HAD surfaces (HAD.fit, did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test) to match the rest of the library (ContinuousDiD/EfficientDiD/dCDH already use survey_design=). The existing survey= and weights= kwargs become deprecated aliases (DeprecationWarning, removal next minor); internal back-end paths unchanged so numerical results are bit-exact pre-PR. Promotes survey._make_trivial_resolved → public make_pweight_design helper for the pweight-only convenience on array-in pretest helpers (which take ResolvedSurveyDesign, not column-referencing SurveyDesign). Underscore name kept as permanent private alias for back-compat. Three-way mutex (survey_design + survey + weights) extends the prior 2-way; two distinct error messages per surface group point users to the right migration target (SurveyDesign(weights='col') for data-in surfaces vs make_pweight_design(arr) for array-in helpers). 535 tests pass (489 pre-PR + 46 new in tests/test_had_dual_knob_deprecation.py covering 8 surfaces × {survey_design= smoke, weights= warn, survey= warn, parity, mutex} plus surface-spanning tests for type guards, normalization- order invariant, and public-helper export). Bit-exact regression locked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 435dfc2 commit efdcb2d

10 files changed

Lines changed: 1186 additions & 173 deletions

File tree

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Changed
11+
- **HAD survey-design API consolidated to single `survey_design=` kwarg** across all 8 HAD surfaces: `HeterogeneousAdoptionDiD.fit`, `did_had_pretest_workflow`, `qug_test`, `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`. Matches the rest of the library (`ContinuousDiD`, `EfficientDiD`, `ChaisemartinDHaultfoeuille` already used `survey_design=`). On data-in surfaces (HAD.fit, workflow, joint data-in wrappers) `survey_design=` accepts a `SurveyDesign` instance (column references resolved against `data` at fit time, same convention as the rest of the library). On array-in surfaces (`stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `qug_test`) `survey_design=` accepts a pre-resolved `ResolvedSurveyDesign`; passing a `SurveyDesign` raises `TypeError` with migration guidance (no `data` to resolve column names against). New public helper `make_pweight_design(weights: np.ndarray) -> ResolvedSurveyDesign` exported from the `diff_diff` top level for the pweight-only convenience on array-in helpers (formerly the private `survey._make_trivial_resolved`, kept as a permanent private alias). Three-way mutex (`survey_design + survey + weights`) extends the prior 2-way (`survey + weights`) — at most one may be non-None per call; two distinct error messages per surface group point users to the right migration target. Patch-level addition (additive new kwarg + permanent alias for the helper; no breaking changes this release).
12+
13+
### Deprecated
14+
- **`HeterogeneousAdoptionDiD.fit(survey=, weights=)`, `did_had_pretest_workflow(survey=, weights=)`, and the 6 HAD pretest helpers' `survey=` / `weights=` kwargs are deprecated** in favor of the canonical `survey_design=`. Emits `DeprecationWarning` with migration guidance; the deprecated kwargs continue to route through the unchanged legacy back-end paths so numerical results are identical to pre-PR (bit-exact regression locked by parity tests in `tests/test_had_dual_knob_deprecation.py`). Both `survey=` and `weights=` will be removed in the next minor release.
15+
1016
### Added
1117
- **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. **Stratified designs (`SurveyDesign(strata=...)`) also raise `NotImplementedError` on the Stute family** — the within-stratum demean + `sqrt(n_h/(n_h-1))` correction that the HAD sup-t bootstrap applies to match the Binder-TSL stratified target has not been derived for the Stute CvM functional, so applying raw multipliers from `generate_survey_multiplier_weights_batch` directly to residual perturbations would leave the bootstrap p-value silently miscalibrated. Phase 4.5 C narrows survey support to **pweight-only**, **PSU-only** (`SurveyDesign(weights=, psu=)`), and **FPC-only** (`SurveyDesign(weights=, fpc=)`) designs; stratified is a follow-up after the matching Stute-CvM stratified-correction derivation lands. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology.
1218
- **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract.

TODO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ Deferred items from PR reviews that were not addressed before merge.
9999
| `HeterogeneousAdoptionDiD` Phase 4.5: weight-aware auto-bandwidth MSE-DPI selector. Phase 4.5 A ships weighted `lprobust` with an unweighted DPI selector; users who want a weight-aware bandwidth must pass `h`/`b` explicitly. Extending `lpbwselect_mse_dpi` to propagate weights through density, second-derivative, and variance stages is ~300 LoC of methodology and was out of scope. | `diff_diff/_nprobust_port.py::lpbwselect_mse_dpi` | Phase 4.5 | Low |
100100
| `HeterogeneousAdoptionDiD` Phase 4.5 C: replicate-weight SurveyDesigns (BRR / Fay / JK1 / JKn / SDR) on the continuous-dose paths. Phase 4.5 A raises `NotImplementedError` on replicate designs in `_aggregate_unit_resolved_survey`. Rao-Wu-style replicate bootstrap for HAD paths requires deriving the per-replicate weight-ratio rescaling for the local-linear intercept IF. | `diff_diff/had.py::_aggregate_unit_resolved_survey` | Phase 4.5 C | Low |
101101
| `HeterogeneousAdoptionDiD` mass-point: `vcov_type in {"hc2", "hc2_bm"}` raises `NotImplementedError` pending a 2SLS-specific leverage derivation. The OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction uses `x_i' (Z'X)^{-1} (...) (X'Z)^{-1} x_i`. Needs derivation plus an R / Stata (`ivreg2 small robust`) parity anchor. | `diff_diff/had.py::_fit_mass_point_2sls` | Phase 2a | Medium |
102+
| `HeterogeneousAdoptionDiD` survey-design API consolidation, **next minor bump**: drop the deprecated `survey=` and `weights=` kwargs on all 8 HAD surfaces (`HeterogeneousAdoptionDiD.fit`, `did_had_pretest_workflow`, `qug_test`, `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`); only `survey_design=` remains. Also fold the legacy back-end `weights=` paths (e.g. `_aggregate_unit_weights` ad-hoc routing) into the unified `_resolve_survey_for_fit`-driven path. The `_make_trivial_resolved` underscore alias on `survey.py` stays (one-line, harmless). DeprecationWarning ships in this PR; the removal PR is ~50 LoC of cleanup. | `diff_diff/had.py`, `diff_diff/had_pretests.py` | next minor bump | Medium |
102103
| `HeterogeneousAdoptionDiD` continuous paths: thread `cluster=` through `bias_corrected_local_linear` (Phase 1c's wrapper already supports cluster; Phase 2a ignores it with a `UserWarning` on the continuous path to keep scope tight). | `diff_diff/had.py`, `diff_diff/local_linear.py` | Phase 2a | Low |
103104
| `HeterogeneousAdoptionDiD` Eq 18 linear-trend detrending (Pierce-Schott style): the joint-Stute infrastructure shipped in the Phase 3 follow-up supports pre-trends (mean-indep) and post-homogeneity (linearity) nulls. The Pierce-Schott application (paper Section 5.2) uses a LINEAR-TREND detrending of pre-period outcomes before the joint CvM — `Y_{g,t} - Y_{g,t_anchor} - (t - t_anchor)*(Y_{g,t_anchor} - Y_{g,t_anchor-1})` — reaching p=0.51 on US-China tariff data. Extends `joint_pretrends_test` with a detrending mode or a separate Eq 18-specific helper. Deferred to Phase 4 replication harness (where the published p=0.51 serves as the parity anchor). | `diff_diff/had_pretests.py::joint_pretrends_test` | Phase 4 | Medium |
104105
| `HeterogeneousAdoptionDiD` Phase 3 Stute performance: Appendix D vectorized matrix form replaces the per-iteration OLS refit with a single precomputed `M = I - X(X'X)^{-1}X'` applied to `eps * eta`. Functionally identical, ~2x faster. Shipped literal-refit form in Phase 3 to match paper text and keep reviewer surface small. | `diff_diff/had_pretests.py::stute_test` | Phase 3 | Low |

diff_diff/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@
151151
SurveyDesign,
152152
SurveyMetadata,
153153
compute_deff_diagnostics,
154+
make_pweight_design,
154155
)
155156
from diff_diff.staggered import (
156157
CallawaySantAnna,
@@ -445,6 +446,7 @@
445446
"SurveyMetadata",
446447
"DEFFDiagnostics",
447448
"compute_deff_diagnostics",
449+
"make_pweight_design",
448450
# Rust backend
449451
"HAS_RUST_BACKEND",
450452
# Linear algebra helpers

diff_diff/had.py

Lines changed: 56 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,13 @@
7676
BiasCorrectedFit,
7777
bias_corrected_local_linear,
7878
)
79-
from diff_diff.survey import SurveyMetadata, compute_survey_metadata
79+
from diff_diff.survey import (
80+
HAD_DEPRECATION_MSG_SURVEY_KWARG,
81+
HAD_DEPRECATION_MSG_WEIGHTS_KWARG_DATA_IN,
82+
HAD_DUAL_KNOB_MUTEX_MSG_DATA_IN,
83+
SurveyMetadata,
84+
compute_survey_metadata,
85+
)
8086
from diff_diff.utils import safe_inference
8187

8288
__all__ = [
@@ -2783,6 +2789,8 @@ def fit(
27832789
unit_col: str,
27842790
first_treat_col: Optional[str] = None,
27852791
aggregate: str = "overall",
2792+
*,
2793+
survey_design: Any = None,
27862794
survey: Any = None,
27872795
weights: Optional[np.ndarray] = None,
27882796
cband: bool = True,
@@ -2835,7 +2843,7 @@ def fit(
28352843
CIs per horizon; joint cross-horizon covariance is deferred
28362844
to a follow-up PR. Staggered-timing panels are auto-filtered
28372845
to the last-treatment cohort with a ``UserWarning``.
2838-
survey : SurveyDesign or None
2846+
survey_design : SurveyDesign or None, keyword-only
28392847
Survey design (sampling weights + optional strata / PSU / FPC)
28402848
for design-based inference on the two continuous-dose paths
28412849
(``continuous_at_zero``, ``continuous_near_d_lower``). Passes
@@ -2847,25 +2855,20 @@ def fit(
28472855
FPC) must be constant within unit (sampling-unit-level
28482856
assignment); within-unit variance raises ``ValueError``.
28492857
Replicate-weight designs raise ``NotImplementedError``
2850-
(Phase 4.5 C). Phase 4.5 B support matrix: survey / weights
2851-
are now accepted on ALL design × aggregate combinations
2852-
(continuous × {overall, event-study}, mass-point × {overall,
2853-
event-study}); HAD pretests (``qug_test``, ``stute_test``,
2854-
``yatchew_hr_test``, joint variants,
2855-
``did_had_pretest_workflow``) still don't accept
2856-
survey/weights — deferred to Phase 4.5 C / C0.
2857-
weights : np.ndarray or None
2858-
Per-row sampling weights as a lightweight shortcut equivalent
2859-
to ``survey=SurveyDesign(weights=<col>)``. Produces the same
2860-
ATT; the SE uses the analytical weighted HC1 sandwich
2861-
(continuous: CCT-2014 weighted-robust; mass-point: pweight
2862-
2SLS sandwich) rather than Binder-TSL. Must be constant
2863-
within each unit; row-order aligned with ``data`` (index
2864-
labels are resolved to positional offsets via
2865-
``data.index.get_indexer``, so custom non-RangeIndex inputs
2866-
work as long as ``data.index`` is unique). Mutually
2867-
exclusive with ``survey=`` — passing both raises
2868-
``ValueError``.
2858+
(Phase 4.5 C). Mutually exclusive with the deprecated
2859+
``survey=`` and ``weights=`` aliases.
2860+
survey : SurveyDesign or None, keyword-only
2861+
DEPRECATED alias of ``survey_design=``. Will be removed in
2862+
the next minor release; prefer ``survey_design=``.
2863+
weights : np.ndarray or None, keyword-only
2864+
DEPRECATED alias for the per-row pweight shortcut. Prefer
2865+
adding the weights as a column on ``data`` and passing
2866+
``survey_design=SurveyDesign(weights='col_name')`` instead.
2867+
Will be removed in the next minor release. Currently
2868+
preserved as the analytical-HC1-sandwich shortcut (continuous:
2869+
CCT-2014 weighted-robust; mass-point: pweight 2SLS sandwich)
2870+
with the per-row → per-unit aggregation invariant intact.
2871+
Mutually exclusive with ``survey_design=`` and ``survey=``.
28692872
cband : bool, default True
28702873
Phase 4.5 B: controls the multiplier-bootstrap simultaneous
28712874
confidence band on the weighted event-study path. When
@@ -2882,19 +2885,43 @@ def fit(
28822885
-------
28832886
HeterogeneousAdoptionDiDResults
28842887
"""
2885-
# ---- aggregate / survey / weights validation ----
2888+
# ---- aggregate / survey_design / survey / weights validation ----
28862889
if aggregate not in _VALID_AGGREGATES:
28872890
raise ValueError(
28882891
f"Invalid aggregate={aggregate!r}. Must be one of " f"{_VALID_AGGREGATES}."
28892892
)
2890-
if survey is not None and weights is not None:
2891-
raise ValueError(
2892-
"Pass survey=<SurveyDesign> OR weights=<array>, not both. "
2893-
"For SurveyDesign-composed inference (PSU, strata, FPC, "
2894-
"replicate weights), use survey=. For a simple pweight-only "
2895-
"shortcut, use weights=; it is internally equivalent to "
2896-
"survey=SurveyDesign(weights=w)."
2893+
# Three-way mutex on survey_design / survey / weights (data-in pattern).
2894+
n_set = sum(x is not None for x in (survey_design, survey, weights))
2895+
if n_set > 1:
2896+
raise ValueError(HAD_DUAL_KNOB_MUTEX_MSG_DATA_IN)
2897+
2898+
# Soft deprecation: route legacy survey=/weights= aliases to
2899+
# survey_design=. The internal back-end paths (legacy weights= and
2900+
# survey= routing below) are unchanged; only the entry signature
2901+
# wraps them. The bit-exact back-compat invariant is preserved
2902+
# because we only rebind names, not values, and the legacy `survey`
2903+
# / `weights` variables are re-derived from `survey_design` for
2904+
# downstream consumption.
2905+
if survey is not None:
2906+
warnings.warn(HAD_DEPRECATION_MSG_SURVEY_KWARG, DeprecationWarning, stacklevel=2)
2907+
survey_design = survey
2908+
elif weights is not None:
2909+
warnings.warn(
2910+
HAD_DEPRECATION_MSG_WEIGHTS_KWARG_DATA_IN,
2911+
DeprecationWarning,
2912+
stacklevel=2,
28972913
)
2914+
# weights= shortcut preserved as-is on the back end (the
2915+
# downstream `if weights is not None:` branch consumes the
2916+
# raw array directly via _aggregate_unit_weights). Don't
2917+
# rebind survey_design here — the array is not a
2918+
# SurveyDesign and survey_design= cannot accept arrays.
2919+
else:
2920+
# Canonical path: survey_design= may be None or a SurveyDesign
2921+
# instance. Map back to the internal `survey` variable name
2922+
# so downstream code (legacy `if survey is not None:` branch)
2923+
# consumes the input transparently.
2924+
survey = survey_design
28982925
# Dispatch the event-study path to a dedicated method so the
28992926
# single-period path stays unchanged (Phase 2a contract preserved).
29002927
# Note: event_study returns HeterogeneousAdoptionDiDEventStudyResults

0 commit comments

Comments
 (0)