22Heterogeneous Adoption Difference-in-Differences (HAD) estimator (Phase 2a).
33
44Implements the de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau (2026)
5- Weighted-Average-Slope (WAS) estimator with three design-dispatch paths:
5+ Weighted-Average-Slope (WAS) estimator with three design-dispatch paths.
6+ All three paths produce a beta-scale point estimate of the form
7+ ``(mean(Delta Y) - [boundary limit]) / [expected dose gap]`` (Design 1
8+ family) or the Wald-IV ratio (mass-point), then route inference through
9+ :func:`diff_diff.utils.safe_inference`.
610
7111. Design 1' (``continuous_at_zero``): ``d_lower = 0``, boundary density
8- continuous at zero. Evaluates the bias-corrected local-linear fit at
9- ``boundary = 0`` and rescales to the beta-scale via Equation 8's divisor
10- ``D_bar = (1/G) * sum(D_{g,2})``. Invokes Assumption 3.
12+ continuous at zero, Assumption 3. Equation 7 / Theorem 3:
13+
14+ beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2]
15+
16+ The bias-corrected local-linear fit at ``boundary = 0`` estimates
17+ the boundary limit; ``E[D_2]`` and ``E[Delta Y]`` are plugin sample
18+ means.
1119
12202. Design 1 continuous-near-d_lower (``continuous_near_d_lower``):
13- ``d_lower > 0``, continuous boundary density. Evaluates the bias-corrected
14- local-linear fit at ``boundary = float(d.min())`` after the regressor
15- shift ``D' = D - d_lower``. Same divisor, same CI path.
16- Invokes Assumption 5 or 6.
17-
18- 3. Design 1 mass-point (``mass_point``): ``d_lower > 0``, modal fraction at
19- ``d.min()`` exceeds 2%. Uses a sample-average 2SLS estimator with
20- instrument ``Z_g = 1{D_{g,2} > d_lower}``. Point estimate reduces to the
21- Wald-IV ratio
22- ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``. Standard error
23- via the structural-residual 2SLS sandwich (see ``_fit_mass_point_2sls``).
21+ ``d_lower > 0``, continuous boundary density, Assumption 5 or 6.
22+ Proposition 3 / Theorem 4 (``WAS_{d_lower}`` under Assumption 6):
23+
24+ beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d])
25+ / E[D_2 - d_lower]
26+
27+ The local-linear fit is anchored at ``d_lower`` via the regressor
28+ shift ``D' = D - d_lower``, evaluated at ``boundary = 0`` on the
29+ shifted scale.
30+
31+ 3. Design 1 mass-point (``mass_point``): ``d_lower > 0``, modal fraction
32+ at ``d.min()`` exceeds 2%. Sample-average 2SLS with instrument
33+ ``Z_g = 1{D_{g,2} > d_lower}`` (paper Section 3.2.4). Point estimate
34+ equals the Wald-IV ratio
35+ ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``. Standard
36+ error via the structural-residual 2SLS sandwich
37+ (see ``_fit_mass_point_2sls``).
2438
2539Phase 2a ships the single-period path only; the multi-period event-study
2640extension (paper Appendix B.2) is queued for Phase 2b.
@@ -119,15 +133,28 @@ class HeterogeneousAdoptionDiDResults:
119133 Attributes
120134 ----------
121135 att : float
122- Point estimate of the WAS parameter on the beta-scale. For the
123- continuous designs: ``tau_bc / D_bar`` where ``tau_bc`` is the
124- bias-corrected local-linear statistic on the mu-scale and
125- ``D_bar = (1/G) * sum(D_{g,2})``. For the mass-point design:
126- the 2SLS coefficient directly.
136+ Point estimate of the WAS parameter on the beta-scale.
137+
138+ - Design 1' (paper Equation 7 / Theorem 3):
139+ ``att = (mean(ΔY) - tau_bc) / D_bar``
140+ where ``tau_bc`` is the bias-corrected local-linear estimate
141+ of ``lim_{d v 0} E[ΔY | D_2 <= d]`` and
142+ ``D_bar = (1/G) * sum(D_{g,2})``.
143+ - Design 1 continuous-near-d_lower (paper Theorem 4,
144+ ``WAS_{d_lower}`` under Assumption 6):
145+ ``att = (mean(ΔY) - tau_bc) / mean(D_2 - d_lower)``
146+ where ``tau_bc`` is the bias-corrected local-linear estimate
147+ of ``lim_{d v d_lower} E[ΔY | D_2 <= d]``.
148+ - Mass-point (paper Section 3.2.4): the Wald-IV / 2SLS
149+ coefficient directly -
150+ ``(Ybar_{Z=1} - Ybar_{Z=0}) / (Dbar_{Z=1} - Dbar_{Z=0})``.
127151 se : float
128152 Standard error on the beta-scale. For continuous designs, the
129- CCT-2014 robust SE divided by ``D_bar``. For mass-point, the 2SLS
130- structural-residual sandwich SE.
153+ CCT-2014 robust SE from Phase 1c divided by ``|den|`` (the
154+ absolute denominator used in ``att``); the higher-order
155+ variance from ``mean(ΔY)`` is dominated by the nonparametric
156+ boundary estimate in large samples and is not included. For
157+ mass-point, the 2SLS structural-residual sandwich SE.
131158 t_stat, p_value, conf_int : inference fields
132159 Routed through ``safe_inference``; NaN when SE is non-finite.
133160 alpha : float
@@ -423,6 +450,26 @@ def _validate_had_panel(
423450 f"t_pre={ t_pre } . Drop these units or verify the dose column."
424451 )
425452
453+ # Post-period nonnegative-dose check on the ORIGINAL (unshifted) dose
454+ # scale. Front-door rejection per paper Assumption (dose definition
455+ # Section 2) which treats D_{g,2} as nonnegative. Without this
456+ # check, negative original doses would only surface after the
457+ # regressor shift in ``_fit_continuous`` via Phase 1c's
458+ # ``_validate_had_inputs``, which references the shifted values
459+ # and would confuse users about which column is malformed.
460+ post_mask = data [time_col ] == t_post
461+ post_doses = np .asarray (data .loc [post_mask , dose_col ], dtype = np .float64 )
462+ neg_post = post_doses < 0
463+ if neg_post .any ():
464+ n_neg = int (neg_post .sum ())
465+ min_neg = float (post_doses [neg_post ].min ())
466+ raise ValueError (
467+ f"HAD requires D_{{g,2}} >= 0 for all units (paper Section "
468+ f"2 dose definition). { n_neg } unit(s) have negative post-"
469+ f"period dose at t_post={ t_post } (min={ min_neg !r} ). Drop "
470+ f"these units or verify the dose column."
471+ )
472+
426473 # Optional value-domain validation via first_treat_col: if supplied,
427474 # every unit's first_treat value must be in {0, t_post} (0 = never
428475 # treated, t_post = treated in the second period). This is a value-
@@ -766,15 +813,21 @@ class HeterogeneousAdoptionDiD:
766813 alpha : float
767814 CI level (0.05 for 95% CI).
768815 vcov_type : {"classical", "hc1"} or None
769- Mass-point-path only. ``None`` defaults to ``"hc1"``.
770- ``"hc2"`` and ``"hc2_bm"`` raise ``NotImplementedError``. Ignored
771- on the continuous paths (which use the CCT-2014 robust SE from
816+ Mass-point-path only. When ``None``, the effective family falls
817+ back to the ``robust`` flag: ``robust=True`` -> ``"hc1"``,
818+ ``robust=False`` -> ``"classical"`` (the default construction).
819+ Explicit ``"hc2"`` and ``"hc2_bm"`` raise ``NotImplementedError``
820+ pending a 2SLS-specific leverage derivation. Ignored on the
821+ continuous paths (which use the CCT-2014 robust SE from
772822 Phase 1c); passing a non-default ``vcov_type`` on a continuous
773- path emits a one-time ``UserWarning``.
823+ path emits a ``UserWarning`` per fit call .
774824 robust : bool
775- Backward-compat alias: ``True`` maps to ``"hc1"``, ``False`` maps
776- to ``"classical"``. Only relevant when ``vcov_type is None`` on
777- the mass-point path.
825+ Backward-compat alias used only when ``vcov_type is None``:
826+ ``True`` -> ``"hc1"``, ``False`` -> ``"classical"``. Explicit
827+ ``vcov_type`` takes precedence (e.g.,
828+ ``vcov_type="classical", robust=True`` runs classical). Only
829+ the mass-point path consumes these; continuous paths ignore
830+ both with a warning.
778831 cluster : str or None
779832 Column name for cluster-robust SE on the mass-point path (CR1).
780833 Ignored with a ``UserWarning`` on the continuous paths in Phase
@@ -989,28 +1042,35 @@ def fit(
9891042 else :
9901043 d_lower_val = float (d_lower_arg )
9911044
992- # ---- Mass-point contract: d_lower must equal the support infimum ----
993- # Paper Section 3.2.4 defines the mass-point estimator with instrument
994- # Z = 1{D_{g,2} > d_lower} where d_lower is the lower-support mass
995- # point. A user-supplied d_lower that differs from float(d_arr.min())
996- # redefines the instrument / control split and identifies a different
997- # estimand. Phase 2a supports only the paper's lower-support mass-point
998- # estimator, so we reject mismatched overrides front-door rather than
999- # silently running an unsupported variant.
1000- if resolved_design == "mass_point" and d_lower_arg is not None :
1045+ # ---- d_lower contract for Design 1 paths ----
1046+ # Paper Sections 3.2.2-3.2.4 define the Design 1 estimators at
1047+ # d_lower = support infimum of D_{g,2}. For the mass-point path,
1048+ # the instrument Z = 1{D_{g,2} > d_lower} requires d_lower to be
1049+ # the lower-support mass point. For the continuous-near-d_lower
1050+ # path, evaluating the local-linear fit after the regressor
1051+ # shift (D - d_lower) at boundary=0 only makes sense when the
1052+ # shift anchors to the realized sample minimum; otherwise the
1053+ # boundary evaluation is off-support (no observations near zero
1054+ # on the shifted scale) and Phase 1c's 5% plausibility heuristic
1055+ # may fail to catch the mismatch. We enforce d_lower == d.min()
1056+ # within float tolerance on both Design 1 paths; mismatched
1057+ # overrides raise with a clear pointer to the unsupported
1058+ # estimand.
1059+ if resolved_design in ("mass_point" , "continuous_near_d_lower" ) and d_lower_arg is not None :
10011060 d_min = float (d_arr .min ())
10021061 tol = 1e-12 * max (1.0 , abs (d_min ))
10031062 if abs (d_lower_val - d_min ) > tol :
10041063 raise ValueError (
1005- f"design='mass_point' requires d_lower to equal the "
1006- f"support infimum float(d.min())={ d_min !r} ; got "
1007- f"d_lower={ d_lower_val !r} . The paper's Design 1 mass- "
1008- f"point estimator (Section 3.2.4) identifies at the "
1009- f"lower-support mass point , not at an arbitrary "
1064+ f"design={ resolved_design !r } requires d_lower to equal "
1065+ f"the support infimum float(d.min())={ d_min !r} ; got "
1066+ f"d_lower={ d_lower_val !r} . The paper's Design 1 "
1067+ f"estimators (Sections 3.2.2-3.2. 4) identify at the "
1068+ f"lower-support boundary , not at an arbitrary "
10101069 f"threshold. Pass d_lower=None to auto-resolve, or "
10111070 f"d_lower=float(d.min()) explicitly. Non-support-"
1012- f"infimum thresholds identify a different (LATE-like) "
1013- f"estimand that is out of Phase 2a scope."
1071+ f"infimum thresholds identify a different (LATE-like "
1072+ f"for mass_point, off-support for continuous_near_"
1073+ f"d_lower) estimand that is out of Phase 2a scope."
10141074 )
10151075
10161076 # ---- Compute cohort counts ----
@@ -1031,6 +1091,26 @@ def fit(
10311091
10321092 dose_mean = float (d_arr .mean ())
10331093
1094+ # ---- Assumption 5/6 warning on Design 1 paths ----
1095+ # Paper Sections 3.2.2-3.2.4: when d_lower > 0 (Design 1 family),
1096+ # point identification of WAS_{d_lower} requires Assumption 6 in
1097+ # addition to parallel trends (Assumption 1-3); Assumption 5 gives
1098+ # only sign identification. These extra assumptions are NOT
1099+ # testable via pre-trends. Surface this to the user front-door so
1100+ # results are not silently interpreted as full point identification.
1101+ if resolved_design in ("continuous_near_d_lower" , "mass_point" ):
1102+ warnings .warn (
1103+ f"design={ resolved_design !r} (Design 1, d_lower > 0) requires "
1104+ f"Assumption 6 from de Chaisemartin et al. (2026) for point "
1105+ f"identification of WAS_{{d_lower}}, or Assumption 5 for "
1106+ f"sign identification only. Neither is testable via "
1107+ f"pre-trends. Confirm the extra assumption is defensible "
1108+ f"for your setting before interpreting the returned "
1109+ f"point estimate as the WAS." ,
1110+ UserWarning ,
1111+ stacklevel = 2 ,
1112+ )
1113+
10341114 # ---- Dispatch ----
10351115 if resolved_design in ("continuous_at_zero" , "continuous_near_d_lower" ):
10361116 # Warn when the user set a mass-point-only knob that's ignored
@@ -1125,28 +1205,53 @@ def _fit_continuous(
11251205 resolved_design : str ,
11261206 d_lower_val : float ,
11271207 ) -> Tuple [float , float , Optional [BiasCorrectedFit ], Optional [BandwidthResult ]]:
1128- """Fit Phase 1c ``bias_corrected_local_linear`` and rescale to beta-scale.
1208+ """Fit Phase 1c ``bias_corrected_local_linear`` and form the WAS estimate.
1209+
1210+ Implements de Chaisemartin, Ciccia, D'Haultfoeuille, and Knau
1211+ (2026) continuous-design estimators:
1212+
1213+ - Design 1' (``continuous_at_zero``), paper Equation 7 /
1214+ Theorem 3:
1215+
1216+ beta = (E[Delta Y] - lim_{d v 0} E[Delta Y | D_2 <= d]) / E[D_2]
1217+
1218+ Regressor passed to the local-linear boundary fit is
1219+ ``d_arr``; the boundary is ``0``.
11291220
1130- Both continuous designs share the same rescaling path - they
1131- differ only in the regressor shift and boundary :
1221+ - Design 1 (``continuous_near_d_lower``), paper Proposition 3 /
1222+ Theorem 4 (``WAS_{d_lower}`` under Assumption 6) :
11321223
1133- - ``continuous_at_zero``: regressor ``d_arr``, boundary ``0``.
1134- - ``continuous_near_d_lower``: regressor ``d_arr - d_lower``,
1135- boundary ``0``.
1224+ beta = (E[Delta Y] - lim_{d v d_lower} E[Delta Y | D_2 <= d])
1225+ / E[D_2 - d_lower]
11361226
1137- The mu-scale output ``tau_bc`` from
1138- :func:`bias_corrected_local_linear` is divided by the
1139- beta-scale divisor ``D_bar = (1/G) * sum(D_{g,2})`` (Equation 8
1140- in the paper). The same divisor applies to the classical CI
1141- endpoints. The result of ``D_bar`` uses the ORIGINAL
1142- (unshifted) dose mean, matching paper convention.
1227+ Regressor passed to the local-linear fit is
1228+ ``d_arr - d_lower`` so the boundary evaluation point is ``0``
1229+ on the shifted scale; the numerator and denominator are
1230+ computed on the original scale.
1231+
1232+ The bias-corrected boundary estimate ``tau_bc`` from
1233+ :func:`bias_corrected_local_linear` corresponds to the limit
1234+ ``lim_{d v d_lower} E[Delta Y | D_2 <= d]``. The beta-scale
1235+ estimator and standard error are then:
1236+
1237+ att = (mean(dy) - tau_bc) / den
1238+ se = se_robust / |den|
1239+
1240+ where ``den`` is the expectation in the denominator (``D_bar``
1241+ for Design 1', ``mean(D_2 - d_lower)`` for Design 1). The
1242+ confidence interval is ``att +/- z * se`` computed in
1243+ :func:`diff_diff.utils.safe_inference`; the endpoints reverse
1244+ relative to the boundary-limit CI because the numerator
1245+ transformation is ``ΔȲ - tau_bc``.
11431246 """
11441247 if resolved_design == "continuous_at_zero" :
11451248 d_reg = d_arr
11461249 boundary = 0.0
1250+ den = float (d_arr .mean ())
11471251 elif resolved_design == "continuous_near_d_lower" :
11481252 d_reg = d_arr - d_lower_val
11491253 boundary = 0.0
1254+ den = float ((d_arr - d_lower_val ).mean ())
11501255 else :
11511256 raise ValueError (
11521257 f"_fit_continuous called with non-continuous " f"design={ resolved_design !r} "
@@ -1174,15 +1279,18 @@ def _fit_continuous(
11741279 except (ZeroDivisionError , FloatingPointError , np .linalg .LinAlgError ):
11751280 return float ("nan" ), float ("nan" ), None , None
11761281
1177- dose_mean = float (d_arr .mean ())
1178- # Phase 1b / Phase 1c pre-checks enforce dose_mean > 0 for
1179- # Design 1-family samples, but defense-in-depth: if dose_mean
1180- # rounds to zero, fall through to NaN via safe_inference in fit().
1181- if abs (dose_mean ) < 1e-12 :
1282+ # Guard against degenerate denominators: if all units are at
1283+ # d_lower (continuous_near_d_lower) or if D_bar rounds to zero
1284+ # (continuous_at_zero with a vanishing sample mean), the beta-
1285+ # scale estimator is undefined. Fall through to NaN via
1286+ # safe_inference in fit().
1287+ if abs (den ) < 1e-12 :
11821288 att = float ("nan" )
11831289 se = float ("nan" )
11841290 else :
1185- att = float (bc_fit .estimate_bias_corrected ) / dose_mean
1186- se = float (bc_fit .se_robust ) / dose_mean
1291+ dy_mean = float (dy_arr .mean ())
1292+ tau_bc = float (bc_fit .estimate_bias_corrected )
1293+ att = (dy_mean - tau_bc ) / den
1294+ se = float (bc_fit .se_robust ) / abs (den )
11871295
11881296 return att , se , bc_fit , bc_fit .bandwidth_diagnostics
0 commit comments