Skip to content

Commit d860840

Browse files
igerberclaude
andcommitted
Address PR #359 CI review round 2 (1 P1 + 1 P2 + 1 P3)
P1 — survey_metadata contract drift: the previous HAD survey fits returned ``survey_metadata`` as a plain dict, but the repo-standard is the :class:`diff_diff.survey.SurveyMetadata` dataclass read via attribute access by BusinessReport, DiagnosticReport, and the shared ``results.py`` plumbing. Passing a HAD survey result through those consumers would silently drop ``df_survey`` / ``effective_n`` / ``n_strata`` / ``n_psu`` under the dict shape. Fixed by routing both entry paths (``survey=SurveyDesign(...)`` and ``weights=<array>``) through :func:`compute_survey_metadata`; ``weights=`` constructs a minimal ``ResolvedSurveyDesign`` (weights-only, no strata/PSU/FPC) uniformly. HAD-specific extras (``variance_formula`` label, the pweight vs Binder-TSL method tag) are moved to a new top-level field ``HeterogeneousAdoptionDiDResults.variance_formula`` (Optional[str]) rather than polluting the shared SurveyMetadata schema. New regression test ``test_survey_metadata_is_surveymetadata_instance`` exercises attribute access on both entry paths. P2 — weighted denominator contract: the continuous paths use weighted population moments internally but the public result still stored only the raw ``dose_mean`` — a user reconstructing β-scale by hand would have been off by the (weighted − raw) gap. Added ``HeterogeneousAdoptionDiDResults.effective_dose_mean`` (Optional[float]) that exposes the actual weighted denominator used by the beta-scale rescaling: ``sum(w·D)/sum(w)`` for ``continuous_at_zero``, ``sum(w·(D−d_lower))/sum(w)`` for ``continuous_near_d_lower``. ``None`` on unweighted fits (use ``dose_mean`` there; the two coincide under uniform weights and the duplicate field would be noise). Raw ``dose_mean`` preserved for backward compat; docstring clarifies the distinction. Four new regression tests covering both continuous designs + the uniform-weights coincidence + the unweighted ``None``-contract. P3 — stale docstrings (``_nprobust_port.py``, ``local_linear.py``, ``had.py``): three docstring blocks still described the pre-Phase-4.5 surface ("weights= not supported" / "IF of the classical estimate" / "CCT-2014 robust SE divided by |den|"). Updated to describe the shipped behavior: lprobust supports weights + return_influence on the BIAS-CORRECTED scale, bias_corrected_local_linear's ``influence_function`` field aligns with ``V_Y_bc``, and HAD's ``se`` documentation now enumerates the three SE paths (unweighted, pweight via weighted-robust, survey via Binder-TSL). All 353 tests pass after refactor (across test_had, test_nprobust_port, test_bias_corrected_lprobust, test_np_npreg_weighted_parity, and the slow MC suite). Ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 6125fda commit d860840

4 files changed

Lines changed: 268 additions & 79 deletions

File tree

diff_diff/_nprobust_port.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,17 @@
3131
3232
Deviations from nprobust (documented):
3333
34-
* ``weights=`` is not supported here or in the public wrapper
35-
(nprobust's ``lpbwselect`` has no weight argument, so Phase 1b has
36-
no parity anchor). Weighted-data support is queued for Phase 2+
37-
(survey-design adaptation). The public wrapper
38-
``mse_optimal_bandwidth`` raises ``NotImplementedError`` when a
39-
``weights`` array is passed.
34+
* ``weights=`` in ``lprobust`` (Phase 4.5 survey support): supported.
35+
User weights multiply into the kernel weights pointwise
36+
(``W_combined = k((x-c)/h) · w``) and propagate through design
37+
matrices, Q.q, and variance matrices. When ``weights=np.ones(N)`` the
38+
function is bit-identical to the unweighted path (regression-tested
39+
at atol=1e-14). ``return_influence=True`` surfaces the per-obs IF of
40+
the BIAS-CORRECTED point estimate (aligned with V_Y_bc) for survey-
41+
composed variance at the estimator level. The bandwidth selector
42+
``lpbwselect_mse_dpi`` and its public wrapper ``mse_optimal_bandwidth``
43+
remain unweighted in this release (no DPI-selector weight derivation
44+
shipped); pass user-specified ``h``/``b`` for weight-aware bandwidths.
4045
* ``vce="nn"`` is the default and is fully ported. ``vce in
4146
{"hc0", "hc1", "hc2", "hc3"}`` is implemented in ``lprobust_res`` /
4247
``lprobust_vce`` but has not been separately golden-tested; use at

diff_diff/had.py

Lines changed: 115 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@
7676
BiasCorrectedFit,
7777
bias_corrected_local_linear,
7878
)
79+
from diff_diff.survey import SurveyMetadata, compute_survey_metadata
7980
from diff_diff.utils import safe_inference
8081

8182
__all__ = [
@@ -225,11 +226,18 @@ class HeterogeneousAdoptionDiDResults:
225226
coefficient directly -
226227
``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``.
227228
se : float
228-
Standard error on the beta-scale. For continuous designs, the
229-
CCT-2014 robust SE from Phase 1c divided by ``|den|`` (the
230-
absolute denominator used in ``att``); the higher-order
231-
variance from ``mean(ΔY)`` is dominated by the nonparametric
232-
boundary estimate in large samples and is not included. For
229+
Standard error on the beta-scale. For continuous designs:
230+
- Unweighted or ``weights=<array>``: CCT-2014 weighted-robust SE
231+
from Phase 1c divided by ``|den|`` (``den`` = raw or weighted
232+
denominator depending on fit path).
233+
- ``survey=SurveyDesign(...)``: Binder (1983) Taylor-series
234+
linearization of the per-unit IF (bias-corrected scale,
235+
aligned with ``tau_bc``) routed through
236+
:func:`compute_survey_if_variance` for PSU-aggregated,
237+
FPC/strata-adjusted variance, divided by ``|den|``.
238+
In both cases the higher-order variance from ``mean(ΔY)`` is
239+
dominated by the nonparametric boundary estimate in large
240+
samples and is not included in the leading-order formula. For
233241
mass-point, the 2SLS structural-residual sandwich SE.
234242
t_stat, p_value, conf_int : inference fields
235243
Routed through ``safe_inference``; NaN when SE is non-finite.
@@ -277,15 +285,18 @@ class HeterogeneousAdoptionDiDResults:
277285
cluster_name : str or None
278286
Column name of the cluster variable on the mass-point path when
279287
cluster-robust SE is requested. ``None`` otherwise.
280-
survey_metadata : dict or None
281-
``None`` when ``fit()`` was called without ``survey=`` or
282-
``weights=``. Under weighted fits (continuous-dose paths only,
283-
per Phase 4.5 A), carries a dict with keys ``method`` ('pweight'
284-
vs 'survey_binder_tsl'), ``source``, ``variance_formula``,
285-
``n_units_weighted``, ``weight_sum``, ``effective_sample_size``,
286-
``n_strata`` / ``n_psu`` (int or None), and ``df_survey`` (int
287-
or None — the survey t-distribution degrees of freedom, routed
288-
through inference under the SurveyDesign path only).
288+
survey_metadata : SurveyMetadata or None
289+
Repo-standard survey metadata dataclass from
290+
:class:`diff_diff.survey.SurveyMetadata`. ``None`` when ``fit()``
291+
was called without ``survey=`` or ``weights=``; populated on the
292+
continuous-dose weighted paths via
293+
:func:`diff_diff.survey.compute_survey_metadata`. Exposes
294+
``weight_type``, ``effective_n``, ``design_effect``,
295+
``sum_weights``, ``n_strata``, ``n_psu``, ``weight_range``, and
296+
``df_survey`` for downstream reporting consumers (BusinessReport,
297+
DiagnosticReport) that read these fields via attribute access.
298+
HAD-specific inference-method info (pweight vs Binder-TSL) is
299+
carried on ``inference_method`` and ``variance_formula``.
289300
bandwidth_diagnostics : BandwidthResult or None
290301
Full Phase 1b MSE-DPI selector output on the continuous paths
291302
(when bandwidths were auto-selected). ``None`` on the mass-point
@@ -320,12 +331,34 @@ class HeterogeneousAdoptionDiDResults:
320331
inference_method: str
321332
vcov_type: Optional[str]
322333
cluster_name: Optional[str]
323-
survey_metadata: Optional[Any]
334+
survey_metadata: Optional[SurveyMetadata]
324335

325336
# Nonparametric-only diagnostics
326337
bandwidth_diagnostics: Optional[BandwidthResult]
327338
bias_corrected_fit: Optional[BiasCorrectedFit]
328339

340+
# Phase 4.5 weighted-path extras (optional so unweighted fits stay unchanged)
341+
variance_formula: Optional[str] = None
342+
"""HAD-specific label for the SE formula on the weighted continuous
343+
path: ``"pweight"`` (weighted-robust CCT 2014) under ``weights=``,
344+
``"survey_binder_tsl"`` (Binder 1983 TSL with PSU/strata/FPC) under
345+
``survey=SurveyDesign(...)``, ``None`` on unweighted or mass-point
346+
fits. Orthogonal to ``survey_metadata`` which is the repo-standard
347+
:class:`diff_diff.survey.SurveyMetadata` shared with downstream
348+
report/diagnostic consumers (no HAD-specific leakage)."""
349+
effective_dose_mean: Optional[float] = None
350+
"""Weighted denominator used by the beta-scale rescaling on the
351+
continuous path: ``sum(w_g · D_g) / sum(w_g)`` for
352+
``continuous_at_zero`` or ``sum(w_g · (D_g - d_lower)) / sum(w_g)``
353+
for ``continuous_near_d_lower``. Reduces bit-exactly to
354+
``dose_mean`` / ``mean(D - d_lower)`` when weights are uniform or
355+
absent. ``None`` when ``fit()`` was called without
356+
``survey=`` / ``weights=`` (use ``dose_mean`` there). Exists because
357+
``dose_mean`` is the raw sample mean of the dose column; under
358+
weighted fits the estimator's actual denominator is the weighted
359+
mean, and users reconstructing the β-scale value by hand need the
360+
weighted one."""
361+
329362
def __repr__(self) -> str:
330363
return (
331364
f"HeterogeneousAdoptionDiDResults("
@@ -373,10 +406,11 @@ def summary(self) -> str:
373406
lines.append(f"{'Obs in window (n_used):':<30} {bc.n_used:>20}")
374407
if self.survey_metadata is not None:
375408
sm = self.survey_metadata
376-
lines.append(f"{'Survey method:':<30} {sm.get('method', 'unknown'):>20}")
377-
if "effective_sample_size" in sm:
378-
ess = sm["effective_sample_size"]
379-
lines.append(f"{'Effective sample size:':<30} {ess:>20.6g}")
409+
vf_label = self.variance_formula or "unknown"
410+
lines.append(f"{'Variance formula:':<30} {vf_label:>20}")
411+
lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>20.6g}")
412+
if sm.df_survey is not None:
413+
lines.append(f"{'Survey df:':<30} {sm.df_survey:>20}")
380414
param_label = self.target_parameter
381415
lines.extend(
382416
[
@@ -563,7 +597,7 @@ class HeterogeneousAdoptionDiDEventStudyResults:
563597
inference_method: str
564598
vcov_type: Optional[str]
565599
cluster_name: Optional[str]
566-
survey_metadata: Optional[Any]
600+
survey_metadata: Optional[SurveyMetadata]
567601

568602
# Per-horizon diagnostics (lists, None on mass-point).
569603
# List entries may be None for horizons where the continuous-path fit
@@ -2720,39 +2754,68 @@ def fit(
27202754
att, se, alpha=float(self.alpha), df=df_infer
27212755
)
27222756

2723-
# Build survey metadata when weights/survey were supplied. When a
2724-
# ResolvedSurveyDesign is available (full survey= path), surface
2725-
# the PSU/strata/FPC composition used for the Binder-TSL variance.
2726-
survey_metadata: Optional[Dict[str, Any]] = None
2757+
# Build survey metadata (repo-standard SurveyMetadata from
2758+
# diff_diff.survey.compute_survey_metadata) when weights/survey
2759+
# were supplied, so downstream report/diagnostic consumers can
2760+
# read attributes uniformly. HAD-specific extras (variance-
2761+
# formula label, effective-denominator value) live on dedicated
2762+
# result fields rather than being folded into the survey dict.
2763+
survey_metadata: Optional[SurveyMetadata] = None
2764+
variance_formula_label: Optional[str] = None
2765+
effective_dose_mean_value: Optional[float] = None
27272766
if weights_unit is not None:
2728-
w_sum = float(weights_unit.sum())
2729-
w_sq_sum = float(np.dot(weights_unit, weights_unit))
2730-
ess = (w_sum * w_sum / w_sq_sum) if w_sq_sum > 0 else float("nan")
27312767
if resolved_survey_unit is not None:
2732-
method = "survey_binder_tsl"
2733-
variance_formula = "Binder 1983 TSL (PSU-aggregated, FPC/strata)"
2734-
source = "SurveyDesign"
2735-
n_strata = int(resolved_survey_unit.n_strata)
2736-
n_psu = int(resolved_survey_unit.n_psu)
2737-
df_survey_meta: Optional[int] = resolved_survey_unit.df_survey
2768+
# survey= path: build metadata from the ResolvedSurveyDesign
2769+
# already aggregated to unit-level by
2770+
# _aggregate_unit_resolved_survey. The resolved weights are
2771+
# post-normalization (mean=1 for pweight), which is the
2772+
# correct raw_weights input for compute_survey_metadata
2773+
# per diff_diff.survey conventions (effective_n and DEFF
2774+
# are scale-invariant on the weight axis).
2775+
survey_metadata = compute_survey_metadata(
2776+
resolved_survey_unit,
2777+
np.asarray(resolved_survey_unit.weights, dtype=np.float64),
2778+
)
2779+
variance_formula_label = "survey_binder_tsl"
27382780
else:
2739-
method = "pweight"
2740-
variance_formula = "weighted-robust (CCT 2014)"
2741-
source = "weights_arr"
2742-
n_strata = None
2743-
n_psu = None
2744-
df_survey_meta = None
2745-
survey_metadata = {
2746-
"method": method,
2747-
"source": source,
2748-
"variance_formula": variance_formula,
2749-
"n_units_weighted": int(weights_unit.shape[0]),
2750-
"weight_sum": w_sum,
2751-
"effective_sample_size": float(ess),
2752-
"n_strata": n_strata,
2753-
"n_psu": n_psu,
2754-
"df_survey": df_survey_meta,
2755-
}
2781+
# weights=<array> shortcut: construct a minimal resolved
2782+
# SurveyDesign with just the unit-level weights (no strata /
2783+
# PSU / FPC) so compute_survey_metadata returns a
2784+
# SurveyMetadata with the same schema as the survey= path.
2785+
# This keeps shared reporting consumers on a single code
2786+
# path — they read attributes regardless of entry point.
2787+
from diff_diff.survey import ResolvedSurveyDesign
2788+
2789+
minimal_resolved = ResolvedSurveyDesign(
2790+
weights=weights_unit,
2791+
weight_type="pweight",
2792+
strata=None,
2793+
psu=None,
2794+
fpc=None,
2795+
n_strata=1,
2796+
n_psu=int(weights_unit.shape[0]),
2797+
lonely_psu="remove",
2798+
combined_weights=True,
2799+
mse=False,
2800+
)
2801+
survey_metadata = compute_survey_metadata(
2802+
minimal_resolved, weights_unit
2803+
)
2804+
variance_formula_label = "pweight"
2805+
# Expose the effective weighted denominator used by the
2806+
# beta-scale rescaling (bc_fit carries it via its internal
2807+
# weighted means, but users inspecting the result directly
2808+
# need the value alongside the raw ``dose_mean``).
2809+
if resolved_design == "continuous_at_zero":
2810+
effective_dose_mean_value = float(
2811+
np.average(d_arr, weights=weights_unit)
2812+
)
2813+
elif resolved_design == "continuous_near_d_lower":
2814+
effective_dose_mean_value = float(
2815+
np.average(d_arr - d_lower_val, weights=weights_unit)
2816+
)
2817+
# else (mass_point): unreachable here because mass_point with
2818+
# weights raises NotImplementedError upstream.
27562819

27572820
return HeterogeneousAdoptionDiDResults(
27582821
att=float(att),
@@ -2776,6 +2839,8 @@ def fit(
27762839
survey_metadata=survey_metadata,
27772840
bandwidth_diagnostics=bw_diag,
27782841
bias_corrected_fit=bc_fit,
2842+
variance_formula=variance_formula_label,
2843+
effective_dose_mean=effective_dose_mean_value,
27792844
)
27802845

27812846
# ------------------------------------------------------------------

diff_diff/local_linear.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -934,11 +934,17 @@ class applies the beta-scale ``(1/G) * sum(D_{g,2})`` rescaling.
934934
kernel: str
935935
boundary: float
936936
influence_function: Optional[np.ndarray] = None
937-
"""Per-observation influence function of the CLASSICAL mu-scale
938-
point estimate ``tau.cl`` (Phase 4.5 survey composition). Aligned
939-
with the original caller-supplied ``d``/``y`` ordering; observations
937+
"""Per-observation influence function of the BIAS-CORRECTED point
938+
estimate ``tau.bc`` (Phase 4.5 survey composition). Aligned with
939+
the original caller-supplied ``d``/``y`` ordering; observations
940940
outside the active kernel window have IF=0. Populated only when
941-
``return_influence=True``; ``None`` otherwise."""
941+
``return_influence=True``; ``None`` otherwise.
942+
943+
Derived from ``Q_q`` + ``res_b`` so the variance self-check is
944+
``sum(IF^2) == V_Y_bc[0, 0]`` under unclustered HC0 — matching the
945+
bias-corrected scale of ``estimate_bias_corrected``. Using the
946+
classical IF here would silently under-estimate survey SE by
947+
ignoring the CCT-2014 bias-correction variance inflation."""
942948

943949

944950
def bias_corrected_local_linear(

0 commit comments

Comments
 (0)