Skip to content

Commit 6125fda

Browse files
igerberclaude
andcommitted
Address PR #359 CI review round 1 (1 P0 + 2 P1 + 2 P2/P3)
P0 — bias-corrected survey IF: the survey variance path was building the influence function from classical residuals (``res_h`` + ``R_p·W_h``, aligned with ``V_Y_cl``) while the returned ATT uses the bias-corrected ``tau_bc``. Under compute_survey_if_variance this silently under-estimated the survey SE by ignoring the CCT-2014 bias-correction variance inflation. Fixed in ``_nprobust_port.lprobust``: IF now uses ``Q_q`` + ``res_b`` so ``sum(IF^2) == V_Y_bc[0,0]`` (verified in new white-box test), matching the estimator scale of the ATT. Uniform- weights bit-parity (weights=np.ones ≡ unweighted at 1e-14) preserved across the new IF formula. The ``TestWeightedLprobust`` HC1 bit-parity test still passes because under weights=ones the classical vs bias- corrected IF only differ by the Q.q bias-correction term, which is deterministic and cancels in the diff. P1a — df_survey threading: survey fits previously used Normal-theory inference regardless of PSU count. ``resolved_survey_unit.df_survey`` (n_psu − n_strata, or replicate QR rank − 1) now routes through ``safe_inference(..., df=...)`` on the survey path, producing t-based p-values and CIs. Also surfaced in ``survey_metadata["df_survey"]`` for introspection. Under small-PSU designs the t-critical exceeds the Normal z-critical, widening the CI vs the prior (wrong) output. The ``weights=`` shortcut continues to use Normal inference since there's no PSU structure to produce a finite df. P1b — reject non-pweight SurveyDesigns: HAD's weighted kernel composition ``W_combined = k((D-d̲)/h) · w`` implements inverse-probability weighting semantics. ``SurveyDesign(weight_type="aweight"|"fweight")`` is now rejected with ``NotImplementedError`` at fit-time — aweight (analytic) implies a different inferential target (weighted regression, not design-based inference), and fweight (frequency) implies observation replication. Neither has been derived for HAD's continuous-dose path; deferred as a follow-up. P2 — test coverage: six new ``TestHADSurvey`` tests locking in the three fixes above: - ``test_survey_if_uses_bias_corrected_scale``: white-box ``sum(IF^2) == V_Y_bc[0,0]`` under nonlinear DGP where V_Y_bc ≠ V_Y_cl (teeth). - ``test_survey_df_widens_ci_vs_normal`` + ``test_survey_df_threaded_into_inference_via_t_distribution``: assert df_survey surfaces in metadata and produces t-CI half-width matching ``t_crit(df) · se`` exactly. - ``test_survey_aweight_raises_not_implemented`` + ``test_survey_fweight_raises_not_implemented``: front-door rejection. - ``test_survey_no_psu_no_strata_se_matches_weights_hc1``: SRS equivalence — survey SE within 10-15% of weights-shortcut SE (the (n/(n-1))-style HC1 correction), ruling out a silent V_Y_cl-vs-V_Y_bc mismatch. P3 — docstring refresh: the stale fit() docstring calling survey/weights "Reserved for a follow-up PR" is replaced with the actual Phase 4.5 A contract (pweight-only, constant-within-unit, replicate-weight deferred, mass-point/event-study/pretests deferred). ``survey_metadata`` docstring on ``HeterogeneousAdoptionDiDResults`` now enumerates the actual dict keys and their semantics. All 348 tests (across test_had, test_nprobust_port, test_bias_corrected_lprobust, test_np_npreg_weighted_parity, and the slow MC suite) pass after the cascade. Ruff clean on modified files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent edf760e commit 6125fda

3 files changed

Lines changed: 286 additions & 29 deletions

File tree

diff_diff/_nprobust_port.py

Lines changed: 33 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1005,15 +1005,19 @@ class LprobustResult:
10051005
V_Y_cl: np.ndarray
10061006
V_Y_bc: np.ndarray
10071007
influence_function: Optional[np.ndarray] = None
1008-
"""Per-observation influence function of the CLASSICAL intercept
1009-
``mu_hat`` (deriv=0 case), aligned with the ORIGINAL x ordering. Shape
1010-
``(N,)``. Set only when ``return_influence=True``; ``None`` otherwise.
1011-
Observations outside the active kernel window have IF=0.
1008+
"""Per-observation influence function of the BIAS-CORRECTED point
1009+
estimate at ``deriv`` (``tau_bc`` for the deriv=0 case), aligned
1010+
with the ORIGINAL x ordering. Shape ``(N,)``. Set only when
1011+
``return_influence=True``; ``None`` otherwise. Observations outside
1012+
the active kernel window have IF=0.
10121013
1013-
Surface used by estimator-level Binder (1983) TSL composition for
1014-
survey-design variance (Phase 4.5 HAD continuous path). The variance
1015-
check ``sum(IF^2) == V_Y_cl[0, 0]`` (up to BLAS ordering) holds when
1016-
weights are uniform and cluster=None."""
1014+
Derived from ``Q_q`` + ``res_b``, so the variance self-check is
1015+
``sum(IF^2) == V_Y_bc[deriv, deriv]`` (up to BLAS ordering) under
1016+
unclustered HC0. Used by estimator-level Binder (1983) TSL
1017+
composition for survey-design variance on HAD's continuous path — the
1018+
bias-corrected scale matches the ATT which itself uses ``tau_bc``.
1019+
Using the classical IF here would silently under-estimate survey SE
1020+
by ignoring the bias-correction variance inflation."""
10171021

10181022

10191023
def lprobust(
@@ -1352,23 +1356,30 @@ def lprobust(
13521356
se_cl = float(np.sqrt((deriv_fact**2) * V_Y_cl[deriv, deriv]))
13531357
se_rb = float(np.sqrt((deriv_fact**2) * V_Y_bc[deriv, deriv]))
13541358

1355-
# --- Per-observation influence function for the classical point
1359+
# --- Per-observation influence function for the BIAS-CORRECTED point
13561360
# estimate at ``deriv`` (Phase 4.5 survey composition).
1357-
# Weighted OLS IF decomposition:
1358-
# beta_hat - beta = invG_p @ sum_g [(R_p[g] * W_h[g]) * res[g]]
1359-
# so psi_g = invG_p[deriv, :] @ (R_p[g] * W_h[g]) * res[g] scaled by
1360-
# deriv_fact. Observations outside the active kernel window have
1361-
# W_h[g]=0 and contribute IF=0. Length-N, aligned with the ORIGINAL
1362-
# x ordering (inverse-permuted when vce="nn" sorts). ---
1361+
# Aligned with ``V_Y_bc`` (NOT ``V_Y_cl``) so survey-composed variance
1362+
# through ``compute_survey_if_variance`` targets the same estimator
1363+
# scale that HAD's beta-scale ATT uses (``tau_bc``-based). Using the
1364+
# classical IF here would under-estimate the variance by ignoring the
1365+
# bias-correction inflation, producing a silently wrong survey SE.
1366+
#
1367+
# Bias-corrected WLS IF decomposition (mirrors the ``V_Y_bc`` sandwich
1368+
# inner at lprobust.R:244): beta_bc - beta = invG_p @ sum_g [ Q_q[g] · res_b[g] ],
1369+
# so psi_g = deriv_fact · invG_p[deriv, :] · Q_q[g, :] · res_b[g].
1370+
# The self-check ``sum(psi^2) == V_Y_bc[deriv, deriv]`` holds under
1371+
# unclustered HC0; under clustering, compute_survey_if_variance
1372+
# aggregates by PSU, which is what the survey path wants.
1373+
# Observations outside the active window have Q_q[g, :]=0 row and
1374+
# contribute IF=0. Length-N, aligned with ORIGINAL x ordering (inverse-
1375+
# permuted when vce="nn" sorts). ---
13631376
influence_function: Optional[np.ndarray] = None
13641377
if return_influence:
1365-
# For active-window observations only.
1366-
row_deriv = invG_p[deriv, :] # (p+1,)
1367-
# (R_p * W_h)[g, :] has shape (p+1,); einsum for clarity.
1368-
coeff_active = (R_p_W_h @ row_deriv).ravel() # (eN,)
1369-
# res_h is shape (eN, 1); squeeze + multiply.
1370-
res_flat = np.asarray(res_h).ravel()
1371-
if_active = deriv_fact * coeff_active * res_flat # (eN,)
1378+
# Bias-corrected IF using Q_q + res_b (active-window only).
1379+
row_deriv_bc = invG_p[deriv, :] # (p+1,)
1380+
coeff_active_bc = (Q_q @ row_deriv_bc).ravel() # (eN,)
1381+
res_b_flat = np.asarray(res_b).ravel()
1382+
if_active = deriv_fact * coeff_active_bc * res_b_flat # (eN,)
13721383
# Map back to full N; zeros for obs outside window.
13731384
if_full_sorted = np.zeros(N, dtype=np.float64)
13741385
if_full_sorted[ind] = if_active

diff_diff/had.py

Lines changed: 63 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -277,9 +277,15 @@ class HeterogeneousAdoptionDiDResults:
277277
cluster_name : str or None
278278
Column name of the cluster variable on the mass-point path when
279279
cluster-robust SE is requested. ``None`` otherwise.
280-
survey_metadata : object or None
281-
Always ``None`` in Phase 2a. Field shape kept for future-compat
282-
with a planned survey integration PR.
280+
survey_metadata : dict or None
281+
``None`` when ``fit()`` was called without ``survey=`` or
282+
``weights=``. Under weighted fits (continuous-dose paths only,
283+
per Phase 4.5 A), carries a dict with keys ``method`` ('pweight'
284+
vs 'survey_binder_tsl'), ``source``, ``variance_formula``,
285+
``n_units_weighted``, ``weight_sum``, ``effective_sample_size``,
286+
``n_strata`` / ``n_psu`` (int or None), and ``df_survey`` (int
287+
or None — the survey t-distribution degrees of freedom, routed
288+
through inference under the SurveyDesign path only).
283289
bandwidth_diagnostics : BandwidthResult or None
284290
Full Phase 1b MSE-DPI selector output on the continuous paths
285291
(when bandwidths were auto-selected). ``None`` on the mass-point
@@ -2262,10 +2268,28 @@ def fit(
22622268
to a follow-up PR. Staggered-timing panels are auto-filtered
22632269
to the last-treatment cohort with a ``UserWarning``.
22642270
survey : SurveyDesign or None
2265-
Reserved for a follow-up survey-integration PR. Must be
2266-
``None`` in Phase 2a.
2271+
Survey design (sampling weights + optional strata / PSU / FPC)
2272+
for design-based inference on the two continuous-dose paths
2273+
(``continuous_at_zero``, ``continuous_near_d_lower``). Passes
2274+
through :func:`compute_survey_if_variance` (Binder 1983 TSL)
2275+
for the SE; weights propagate pointwise into the lprobust
2276+
kernel composition. Only ``weight_type="pweight"`` is
2277+
supported in Phase 4.5 A — ``aweight`` / ``fweight`` raise
2278+
``NotImplementedError``. Survey design columns (strata / PSU /
2279+
FPC) must be constant within unit (sampling-unit-level
2280+
assignment); within-unit variance raises ``ValueError``.
2281+
Replicate-weight designs raise ``NotImplementedError``
2282+
(Phase 4.5 C). ``design="mass_point"`` and
2283+
``aggregate="event_study"`` raise ``NotImplementedError`` on
2284+
survey/weights (Phase 4.5 B).
22672285
weights : np.ndarray or None
2268-
Reserved for a follow-up PR. Must be ``None`` in Phase 2a.
2286+
Per-row sampling weights as a lightweight shortcut equivalent
2287+
to ``survey=SurveyDesign(weights=<col>)``. Produces the same
2288+
ATT; the SE uses lprobust's weighted-robust CCT-2014 formula
2289+
rather than Binder-TSL (no PSU/strata composition). Mutually
2290+
exclusive with ``survey=`` — passing both raises
2291+
``ValueError``. Must be constant within each unit (same
2292+
invariant as ``survey=``).
22692293
22702294
Returns
22712295
-------
@@ -2363,6 +2387,25 @@ def fit(
23632387
"survey=SurveyDesign(weights='<col>', ...) with a "
23642388
"per-row weight column."
23652389
)
2390+
# HAD's weighted local-linear treats ``weights`` as sampling
2391+
# (probability) weights: the kernel-composition formula
2392+
# ``W_combined = k((D-d̲)/h) · w`` is the inverse-probability
2393+
# weighting convention. Frequency weights (``fweight``)
2394+
# would imply replicating observations, and analytic weights
2395+
# (``aweight``, inverse-variance) would imply a different
2396+
# inferential target. Reject those up front rather than
2397+
# silently reinterpreting.
2398+
weight_type = getattr(survey, "weight_type", "pweight")
2399+
if weight_type != "pweight":
2400+
raise NotImplementedError(
2401+
f"survey=SurveyDesign(weight_type={weight_type!r}) is "
2402+
f"not supported on HeterogeneousAdoptionDiD's "
2403+
f"continuous path. Only ``weight_type='pweight'`` "
2404+
f"(sampling / inverse-probability weights) is "
2405+
f"implemented in Phase 4.5 A. Frequency weights "
2406+
f"(fweight) and analytic weights (aweight) would "
2407+
f"imply different estimands and are not yet derived."
2408+
)
23662409
# Resolve the SurveyDesign against the long-panel data. This
23672410
# validates column names, applies pweight/aweight normalization
23682411
# to mean=1, and extracts numpy arrays for all design columns.
@@ -2665,7 +2708,17 @@ def fit(
26652708
raise ValueError(f"Internal error: unhandled design={resolved_design!r}.")
26662709

26672710
# ---- Route all inference fields through safe_inference ----
2668-
t_stat, p_value, conf_int = safe_inference(att, se, alpha=float(self.alpha))
2711+
# Survey path: use t-distribution with ``df_survey = n_psu -
2712+
# n_strata`` (or replicate-QR rank − 1) so small-PSU designs
2713+
# don't get Normal-theory inference that overstates precision.
2714+
# Non-survey path (``weights=`` shortcut or unweighted): use
2715+
# the existing Normal-theory default.
2716+
df_infer: Optional[int] = None
2717+
if resolved_survey_unit is not None:
2718+
df_infer = resolved_survey_unit.df_survey
2719+
t_stat, p_value, conf_int = safe_inference(
2720+
att, se, alpha=float(self.alpha), df=df_infer
2721+
)
26692722

26702723
# Build survey metadata when weights/survey were supplied. When a
26712724
# ResolvedSurveyDesign is available (full survey= path), surface
@@ -2681,12 +2734,14 @@ def fit(
26812734
source = "SurveyDesign"
26822735
n_strata = int(resolved_survey_unit.n_strata)
26832736
n_psu = int(resolved_survey_unit.n_psu)
2737+
df_survey_meta: Optional[int] = resolved_survey_unit.df_survey
26842738
else:
26852739
method = "pweight"
26862740
variance_formula = "weighted-robust (CCT 2014)"
26872741
source = "weights_arr"
26882742
n_strata = None
26892743
n_psu = None
2744+
df_survey_meta = None
26902745
survey_metadata = {
26912746
"method": method,
26922747
"source": source,
@@ -2696,6 +2751,7 @@ def fit(
26962751
"effective_sample_size": float(ess),
26972752
"n_strata": n_strata,
26982753
"n_psu": n_psu,
2754+
"df_survey": df_survey_meta,
26992755
}
27002756

27012757
return HeterogeneousAdoptionDiDResults(

0 commit comments

Comments
 (0)