Address PR #359 CI review round 2 (1 P1 + 1 P2 + 1 P3)

igerber · claude · igerber · commit d86084070c40 · 2026-04-24T10:14:29.000-04:00
P1 — survey_metadata contract drift: the previous HAD survey fits
returned ``survey_metadata`` as a plain dict, but the repo-standard is
the :class:`diff_diff.survey.SurveyMetadata` dataclass read via
attribute access by BusinessReport, DiagnosticReport, and the shared
``results.py`` plumbing. Passing a HAD survey result through those
consumers would silently drop ``df_survey`` / ``effective_n`` /
``n_strata`` / ``n_psu`` under the dict shape. Fixed by routing both
entry paths (``survey=SurveyDesign(...)`` and ``weights=&lt;array&gt;``)
through :func:`compute_survey_metadata`; ``weights=`` constructs a
minimal ``ResolvedSurveyDesign`` (weights-only, no strata/PSU/FPC)
uniformly. HAD-specific extras (``variance_formula`` label, the
pweight vs Binder-TSL method tag) are moved to a new top-level field
``HeterogeneousAdoptionDiDResults.variance_formula`` (Optional[str])
rather than polluting the shared SurveyMetadata schema. New regression
test ``test_survey_metadata_is_surveymetadata_instance`` exercises
attribute access on both entry paths.

P2 — weighted denominator contract: the continuous paths use weighted
population moments internally but the public result still stored only
the raw ``dose_mean`` — a user reconstructing β-scale by hand would
have been off by the (weighted − raw) gap. Added
``HeterogeneousAdoptionDiDResults.effective_dose_mean`` (Optional[float])
that exposes the actual weighted denominator used by the beta-scale
rescaling: ``sum(w·D)/sum(w)`` for ``continuous_at_zero``,
``sum(w·(D−d_lower))/sum(w)`` for ``continuous_near_d_lower``. ``None``
on unweighted fits (use ``dose_mean`` there; the two coincide under
uniform weights and the duplicate field would be noise). Raw
``dose_mean`` preserved for backward compat; docstring clarifies the
distinction. Four new regression tests covering both continuous
designs + the uniform-weights coincidence + the unweighted
``None``-contract.

P3 — stale docstrings (``_nprobust_port.py``, ``local_linear.py``,
``had.py``): three docstring blocks still described the pre-Phase-4.5
surface ("weights= not supported" / "IF of the classical estimate"
/ "CCT-2014 robust SE divided by |den|"). Updated to describe the
shipped behavior: lprobust supports weights + return_influence on the
BIAS-CORRECTED scale, bias_corrected_local_linear's
``influence_function`` field aligns with ``V_Y_bc``, and HAD's ``se``
documentation now enumerates the three SE paths (unweighted, pweight
via weighted-robust, survey via Binder-TSL).

All 353 tests pass after refactor (across test_had, test_nprobust_port,
test_bias_corrected_lprobust, test_np_npreg_weighted_parity, and the
slow MC suite). Ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/_nprobust_port.py b/diff_diff/_nprobust_port.py
@@ -31,12 +31,17 @@
 
 Deviations from nprobust (documented):
 
-* ``weights=`` is not supported here or in the public wrapper
-  (nprobust's ``lpbwselect`` has no weight argument, so Phase 1b has
-  no parity anchor). Weighted-data support is queued for Phase 2+
-  (survey-design adaptation). The public wrapper
-  ``mse_optimal_bandwidth`` raises ``NotImplementedError`` when a
-  ``weights`` array is passed.
+* ``weights=`` in ``lprobust`` (Phase 4.5 survey support): supported.
+  User weights multiply into the kernel weights pointwise
+  (``W_combined = k((x-c)/h) · w``) and propagate through design
+  matrices, Q.q, and variance matrices. When ``weights=np.ones(N)`` the
+  function is bit-identical to the unweighted path (regression-tested
+  at atol=1e-14). ``return_influence=True`` surfaces the per-obs IF of
+  the BIAS-CORRECTED point estimate (aligned with V_Y_bc) for survey-
+  composed variance at the estimator level. The bandwidth selector
+  ``lpbwselect_mse_dpi`` and its public wrapper ``mse_optimal_bandwidth``
+  remain unweighted in this release (no DPI-selector weight derivation
+  shipped); pass user-specified ``h``/``b`` for weight-aware bandwidths.
 * ``vce="nn"`` is the default and is fully ported. ``vce in
   {"hc0", "hc1", "hc2", "hc3"}`` is implemented in ``lprobust_res`` /
   ``lprobust_vce`` but has not been separately golden-tested; use at
diff --git a/diff_diff/had.py b/diff_diff/had.py
@@ -76,6 +76,7 @@
     BiasCorrectedFit,
     bias_corrected_local_linear,
 )
+from diff_diff.survey import SurveyMetadata, compute_survey_metadata
 from diff_diff.utils import safe_inference
 
 __all__ = [
@@ -225,11 +226,18 @@ class HeterogeneousAdoptionDiDResults:
           coefficient directly -
           ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``.
     se : float
-        Standard error on the beta-scale. For continuous designs, the
-        CCT-2014 robust SE from Phase 1c divided by ``|den|`` (the
-        absolute denominator used in ``att``); the higher-order
-        variance from ``mean(ΔY)`` is dominated by the nonparametric
-        boundary estimate in large samples and is not included. For
+        Standard error on the beta-scale. For continuous designs:
+        - Unweighted or ``weights=<array>``: CCT-2014 weighted-robust SE
+          from Phase 1c divided by ``|den|`` (``den`` = raw or weighted
+          denominator depending on fit path).
+        - ``survey=SurveyDesign(...)``: Binder (1983) Taylor-series
+          linearization of the per-unit IF (bias-corrected scale,
+          aligned with ``tau_bc``) routed through
+          :func:`compute_survey_if_variance` for PSU-aggregated,
+          FPC/strata-adjusted variance, divided by ``|den|``.
+        In both cases the higher-order variance from ``mean(ΔY)`` is
+        dominated by the nonparametric boundary estimate in large
+        samples and is not included in the leading-order formula. For
         mass-point, the 2SLS structural-residual sandwich SE.
     t_stat, p_value, conf_int : inference fields
         Routed through ``safe_inference``; NaN when SE is non-finite.
@@ -277,15 +285,18 @@ class HeterogeneousAdoptionDiDResults:
     cluster_name : str or None
         Column name of the cluster variable on the mass-point path when
         cluster-robust SE is requested. ``None`` otherwise.
-    survey_metadata : dict or None
-        ``None`` when ``fit()`` was called without ``survey=`` or
-        ``weights=``. Under weighted fits (continuous-dose paths only,
-        per Phase 4.5 A), carries a dict with keys ``method`` ('pweight'
-        vs 'survey_binder_tsl'), ``source``, ``variance_formula``,
-        ``n_units_weighted``, ``weight_sum``, ``effective_sample_size``,
-        ``n_strata`` / ``n_psu`` (int or None), and ``df_survey`` (int
-        or None — the survey t-distribution degrees of freedom, routed
-        through inference under the SurveyDesign path only).
+    survey_metadata : SurveyMetadata or None
+        Repo-standard survey metadata dataclass from
+        :class:`diff_diff.survey.SurveyMetadata`. ``None`` when ``fit()``
+        was called without ``survey=`` or ``weights=``; populated on the
+        continuous-dose weighted paths via
+        :func:`diff_diff.survey.compute_survey_metadata`. Exposes
+        ``weight_type``, ``effective_n``, ``design_effect``,
+        ``sum_weights``, ``n_strata``, ``n_psu``, ``weight_range``, and
+        ``df_survey`` for downstream reporting consumers (BusinessReport,
+        DiagnosticReport) that read these fields via attribute access.
+        HAD-specific inference-method info (pweight vs Binder-TSL) is
+        carried on ``inference_method`` and ``variance_formula``.
     bandwidth_diagnostics : BandwidthResult or None
         Full Phase 1b MSE-DPI selector output on the continuous paths
         (when bandwidths were auto-selected). ``None`` on the mass-point
@@ -320,12 +331,34 @@ class HeterogeneousAdoptionDiDResults:
     inference_method: str
     vcov_type: Optional[str]
     cluster_name: Optional[str]
-    survey_metadata: Optional[Any]
+    survey_metadata: Optional[SurveyMetadata]
 
     # Nonparametric-only diagnostics
     bandwidth_diagnostics: Optional[BandwidthResult]
     bias_corrected_fit: Optional[BiasCorrectedFit]
 
+    # Phase 4.5 weighted-path extras (optional so unweighted fits stay unchanged)
+    variance_formula: Optional[str] = None
+    """HAD-specific label for the SE formula on the weighted continuous
+    path: ``"pweight"`` (weighted-robust CCT 2014) under ``weights=``,
+    ``"survey_binder_tsl"`` (Binder 1983 TSL with PSU/strata/FPC) under
+    ``survey=SurveyDesign(...)``, ``None`` on unweighted or mass-point
+    fits. Orthogonal to ``survey_metadata`` which is the repo-standard
+    :class:`diff_diff.survey.SurveyMetadata` shared with downstream
+    report/diagnostic consumers (no HAD-specific leakage)."""
+    effective_dose_mean: Optional[float] = None
+    """Weighted denominator used by the beta-scale rescaling on the
+    continuous path: ``sum(w_g · D_g) / sum(w_g)`` for
+    ``continuous_at_zero`` or ``sum(w_g · (D_g - d_lower)) / sum(w_g)``
+    for ``continuous_near_d_lower``. Reduces bit-exactly to
+    ``dose_mean`` / ``mean(D - d_lower)`` when weights are uniform or
+    absent. ``None`` when ``fit()`` was called without
+    ``survey=`` / ``weights=`` (use ``dose_mean`` there). Exists because
+    ``dose_mean`` is the raw sample mean of the dose column; under
+    weighted fits the estimator's actual denominator is the weighted
+    mean, and users reconstructing the β-scale value by hand need the
+    weighted one."""
+
     def __repr__(self) -> str:
         return (
             f"HeterogeneousAdoptionDiDResults("
@@ -373,10 +406,11 @@ def summary(self) -> str:
             lines.append(f"{'Obs in window (n_used):':<30} {bc.n_used:>20}")
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.append(f"{'Survey method:':<30} {sm.get('method', 'unknown'):>20}")
-            if "effective_sample_size" in sm:
-                ess = sm["effective_sample_size"]
-                lines.append(f"{'Effective sample size:':<30} {ess:>20.6g}")
+            vf_label = self.variance_formula or "unknown"
+            lines.append(f"{'Variance formula:':<30} {vf_label:>20}")
+            lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>20.6g}")
+            if sm.df_survey is not None:
+                lines.append(f"{'Survey df:':<30} {sm.df_survey:>20}")
         param_label = self.target_parameter
         lines.extend(
             [
@@ -563,7 +597,7 @@ class HeterogeneousAdoptionDiDEventStudyResults:
     inference_method: str
     vcov_type: Optional[str]
     cluster_name: Optional[str]
-    survey_metadata: Optional[Any]
+    survey_metadata: Optional[SurveyMetadata]
 
     # Per-horizon diagnostics (lists, None on mass-point).
     # List entries may be None for horizons where the continuous-path fit
@@ -2720,39 +2754,68 @@ def fit(
             att, se, alpha=float(self.alpha), df=df_infer
         )
 
-        # Build survey metadata when weights/survey were supplied. When a
-        # ResolvedSurveyDesign is available (full survey= path), surface
-        # the PSU/strata/FPC composition used for the Binder-TSL variance.
-        survey_metadata: Optional[Dict[str, Any]] = None
+        # Build survey metadata (repo-standard SurveyMetadata from
+        # diff_diff.survey.compute_survey_metadata) when weights/survey
+        # were supplied, so downstream report/diagnostic consumers can
+        # read attributes uniformly. HAD-specific extras (variance-
+        # formula label, effective-denominator value) live on dedicated
+        # result fields rather than being folded into the survey dict.
+        survey_metadata: Optional[SurveyMetadata] = None
+        variance_formula_label: Optional[str] = None
+        effective_dose_mean_value: Optional[float] = None
         if weights_unit is not None:
-            w_sum = float(weights_unit.sum())
-            w_sq_sum = float(np.dot(weights_unit, weights_unit))
-            ess = (w_sum * w_sum / w_sq_sum) if w_sq_sum > 0 else float("nan")
             if resolved_survey_unit is not None:
-                method = "survey_binder_tsl"
-                variance_formula = "Binder 1983 TSL (PSU-aggregated, FPC/strata)"
-                source = "SurveyDesign"
-                n_strata = int(resolved_survey_unit.n_strata)
-                n_psu = int(resolved_survey_unit.n_psu)
-                df_survey_meta: Optional[int] = resolved_survey_unit.df_survey
+                # survey= path: build metadata from the ResolvedSurveyDesign
+                # already aggregated to unit-level by
+                # _aggregate_unit_resolved_survey. The resolved weights are
+                # post-normalization (mean=1 for pweight), which is the
+                # correct raw_weights input for compute_survey_metadata
+                # per diff_diff.survey conventions (effective_n and DEFF
+                # are scale-invariant on the weight axis).
+                survey_metadata = compute_survey_metadata(
+                    resolved_survey_unit,
+                    np.asarray(resolved_survey_unit.weights, dtype=np.float64),
+                )
+                variance_formula_label = "survey_binder_tsl"
             else:
-                method = "pweight"
-                variance_formula = "weighted-robust (CCT 2014)"
-                source = "weights_arr"
-                n_strata = None
-                n_psu = None
-                df_survey_meta = None
-            survey_metadata = {
-                "method": method,
-                "source": source,
-                "variance_formula": variance_formula,
-                "n_units_weighted": int(weights_unit.shape[0]),
-                "weight_sum": w_sum,
-                "effective_sample_size": float(ess),
-                "n_strata": n_strata,
-                "n_psu": n_psu,
-                "df_survey": df_survey_meta,
-            }
+                # weights=<array> shortcut: construct a minimal resolved
+                # SurveyDesign with just the unit-level weights (no strata /
+                # PSU / FPC) so compute_survey_metadata returns a
+                # SurveyMetadata with the same schema as the survey= path.
+                # This keeps shared reporting consumers on a single code
+                # path — they read attributes regardless of entry point.
+                from diff_diff.survey import ResolvedSurveyDesign
+
+                minimal_resolved = ResolvedSurveyDesign(
+                    weights=weights_unit,
+                    weight_type="pweight",
+                    strata=None,
+                    psu=None,
+                    fpc=None,
+                    n_strata=1,
+                    n_psu=int(weights_unit.shape[0]),
+                    lonely_psu="remove",
+                    combined_weights=True,
+                    mse=False,
+                )
+                survey_metadata = compute_survey_metadata(
+                    minimal_resolved, weights_unit
+                )
+                variance_formula_label = "pweight"
+            # Expose the effective weighted denominator used by the
+            # beta-scale rescaling (bc_fit carries it via its internal
+            # weighted means, but users inspecting the result directly
+            # need the value alongside the raw ``dose_mean``).
+            if resolved_design == "continuous_at_zero":
+                effective_dose_mean_value = float(
+                    np.average(d_arr, weights=weights_unit)
+                )
+            elif resolved_design == "continuous_near_d_lower":
+                effective_dose_mean_value = float(
+                    np.average(d_arr - d_lower_val, weights=weights_unit)
+                )
+            # else (mass_point): unreachable here because mass_point with
+            # weights raises NotImplementedError upstream.
 
         return HeterogeneousAdoptionDiDResults(
             att=float(att),
@@ -2776,6 +2839,8 @@ def fit(
             survey_metadata=survey_metadata,
             bandwidth_diagnostics=bw_diag,
             bias_corrected_fit=bc_fit,
+            variance_formula=variance_formula_label,
+            effective_dose_mean=effective_dose_mean_value,
         )
 
     # ------------------------------------------------------------------
diff --git a/diff_diff/local_linear.py b/diff_diff/local_linear.py
@@ -934,11 +934,17 @@ class applies the beta-scale ``(1/G) * sum(D_{g,2})`` rescaling.
     kernel: str
     boundary: float
     influence_function: Optional[np.ndarray] = None
-    """Per-observation influence function of the CLASSICAL mu-scale
-    point estimate ``tau.cl`` (Phase 4.5 survey composition). Aligned
-    with the original caller-supplied ``d``/``y`` ordering; observations
+    """Per-observation influence function of the BIAS-CORRECTED point
+    estimate ``tau.bc`` (Phase 4.5 survey composition). Aligned with
+    the original caller-supplied ``d``/``y`` ordering; observations
     outside the active kernel window have IF=0. Populated only when
-    ``return_influence=True``; ``None`` otherwise."""
+    ``return_influence=True``; ``None`` otherwise.
+
+    Derived from ``Q_q`` + ``res_b`` so the variance self-check is
+    ``sum(IF^2) == V_Y_bc[0, 0]`` under unclustered HC0 — matching the
+    bias-corrected scale of ``estimate_bias_corrected``. Using the
+    classical IF here would silently under-estimate survey SE by
+    ignoring the CCT-2014 bias-correction variance inflation."""
 
 
 def bias_corrected_local_linear(
diff --git a/tests/test_had.py b/tests/test_had.py