Skip to content

Commit 4323061

Browse files
igerberclaude
andcommitted
Address PR #370 R7 review (1 P0 + 1 P1)
R7 P0 (Methodology) -- weighted CvM was missing the outer measure factor. Survey-weighted plug-in of Stute's CvM functional integrates the squared cusum process against the survey-weighted EDF F_hat_w = (1/W) sum_i w_i delta_{D_i}, which weights BOTH the inner cusum AND the outer integration measure: C_g = sum_{h <= g} w_h * eps_h S_w = (1/W^2) * sum_g w_g * (C_g)^2 Earlier revisions used (1/W^2) * sum_g C_g^2 (no outer w_g) which is a count-weighted-cusum / uniform-outer-measure functional and silently misreports survey-weighted Stute statistics for non-uniform weights. At w=ones(G) both forms reduce to (1/G^2) sum_g C_g^2; only non-uniform weights distinguish them, which is why the prior reduction tests didn't catch this. Fix: add the outer w_sorted factor to _cvm_statistic_weighted. New oracle test pins the formula on a hand-computed non-uniform-weight example (w=[1,2,3], eps=[1,-2,3] -> 127/36 outer-weighted, NOT 46/36 count-weighted-cusum). Reduction at w=1 still bit-exact. R7 P1 (Code Quality) -- survey verdict could leave pass cases starting with "inconclusive". The previous approach composed the unweighted verdict against a synthetic NaN QUG and string-replaced "QUG NaN" out; when no rejections fired and linearity was conclusive, the underlying "inconclusive - QUG NaN" template would collapse to "inconclusive" even for all_pass=True paths. Fix: explicit survey-aware verdict composers _compose_verdict_overall_survey and _compose_verdict_event_study_survey that drop QUG from consideration entirely and emit linearity-only priority text. Both append the "(linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0)" suffix in every branch (rejections, inconclusive, fail-to-reject). 4 new regression tests: - test_cvm_statistic_weighted_outer_measure_oracle (R7 P0 hand-computed) - test_cvm_statistic_weighted_reduces_at_uniform_weights (R7 P0 reduction) - test_workflow_overall_survey_pass_does_not_say_inconclusive (R7 P1) - test_workflow_event_study_survey_pass_does_not_say_inconclusive (R7 P1) 187 pretest tests pass (was 183 after R6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 20f158a commit 4323061

2 files changed

Lines changed: 226 additions & 63 deletions

File tree

diff_diff/had_pretests.py

Lines changed: 141 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -1049,21 +1049,31 @@ def _has_lonely_psu_adjust_singletons(resolved: Any) -> bool:
10491049
def _cvm_statistic_weighted(
10501050
eps_sorted: np.ndarray, d_sorted: np.ndarray, w_sorted: np.ndarray
10511051
) -> float:
1052-
"""Weighted analog of :func:`_cvm_statistic`.
1053-
1054-
Aggregates weighted residuals into the cusum:
1055-
1056-
C_g = sum_{h : D_h <= D_g} w_h * eps_h
1057-
S = (1 / W^2) * sum_g (C_g)^2, W = sum(w)
1058-
1059-
At ``w = ones(G)``, ``W = G`` and the formula reduces bit-exactly to
1060-
the unweighted ``_cvm_statistic`` (locked at ``atol=1e-14`` by the
1061-
survey-path direct helper tests; Phase 4.5 C stability invariant #1).
1052+
"""Weighted analog of :func:`_cvm_statistic` (survey-weighted plug-in).
1053+
1054+
The unweighted Stute CvM `S = (1/G) * sum_g c_G(D_g)^2` integrates
1055+
the squared cusum process against the empirical CDF
1056+
``F_hat = (1/G) sum_i delta_{D_i}``. The weighted plug-in replaces
1057+
``F_hat`` by the survey-weighted EDF
1058+
``F_hat_w = (1/W) sum_i w_i delta_{D_i}``, which weights BOTH the
1059+
inner cusum AND the outer integration measure:
1060+
1061+
C_g = sum_{h : D_h <= D_g} w_h * eps_h (inner cusum, weighted)
1062+
S_w = (1 / W^2) * sum_g w_g * (C_g)^2 (outer measure, weighted)
1063+
W = sum(w)
1064+
1065+
The outer ``w_g`` factor on each squared cusum (R7 P0 fix) is what
1066+
distinguishes this from a count-weighted-cusum form
1067+
``(1/W^2) * sum_g C_g^2`` (no outer ``w_g``), which silently
1068+
misreports survey-weighted Stute statistics for non-uniform weights.
1069+
At ``w = ones(G)`` both forms reduce to ``(1/G^2) sum_g C_g^2``
1070+
(unweighted) -- only non-uniform weights distinguish them.
10621071
10631072
Tie-block collapse uses the same ``np.unique(d_sorted)`` count
1064-
machinery as the unweighted form — positions are determined by
1065-
``d_sorted`` ties (independent of weights), so the collapse pattern is
1066-
weight-invariant.
1073+
machinery as the unweighted form -- positions are determined by
1074+
``d_sorted`` ties (independent of weights), so the collapse pattern
1075+
is weight-invariant. The outer ``w_sorted`` factor applies to the
1076+
tie-collapsed cusum at each observation.
10671077
10681078
Parameters
10691079
----------
@@ -1082,7 +1092,9 @@ def _cvm_statistic_weighted(
10821092
tie_end_idx = np.cumsum(counts) - 1
10831093
cumsum_tie_safe = np.repeat(cumsum[tie_end_idx], counts)
10841094
W = float(np.sum(w_sorted))
1085-
return float(np.sum(cumsum_tie_safe * cumsum_tie_safe) / (W * W))
1095+
# R7 P0: integrate outer measure against F_hat_w via the w_sorted
1096+
# factor on each squared cusum (NOT against uniform 1/G measure).
1097+
return float(np.sum(w_sorted * cumsum_tie_safe * cumsum_tie_safe) / (W * W))
10861098

10871099

10881100
def _compose_verdict(
@@ -3534,6 +3546,114 @@ def joint_homogeneity_test(
35343546

35353547
_VALID_AGGREGATES = ("overall", "event_study")
35363548

3549+
_QUG_DEFERRED_SUFFIX = (
3550+
" (linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0)"
3551+
)
3552+
3553+
3554+
def _compose_verdict_overall_survey(
3555+
stute: Optional[StuteTestResults],
3556+
yatchew: Optional[YatchewTestResults],
3557+
) -> str:
3558+
"""Build the overall-path :class:`HADPretestReport` verdict on the
3559+
survey/weights branch (Phase 4.5 C).
3560+
3561+
Drops the QUG step from consideration (skipped per Phase 4.5 C0)
3562+
and composes the verdict from Stute + Yatchew alone, with the
3563+
linearity-conditional suffix appended in every branch. R7 P1 fix:
3564+
explicit survey-aware composer replaces the prior approach of
3565+
composing the unweighted verdict with a NaN QUG and string-replacing
3566+
the resulting "QUG NaN" suffix, which could leave pass cases starting
3567+
with "inconclusive".
3568+
3569+
Priority (mirrors :func:`_compose_verdict` minus QUG):
3570+
1. Conclusive rejections of Stute or Yatchew lead.
3571+
2. No conclusive rejection but linearity inconclusive (both NaN)
3572+
-> "inconclusive - both linearity tests NaN".
3573+
3. Linearity conclusive (at least one of Stute/Yatchew finite) AND
3574+
no rejection -> fail-to-reject string.
3575+
All branches end with `_QUG_DEFERRED_SUFFIX`.
3576+
"""
3577+
stute_ok = stute is not None and bool(np.isfinite(stute.p_value))
3578+
yatchew_ok = yatchew is not None and bool(np.isfinite(yatchew.p_value))
3579+
stute_rej = stute_ok and bool(stute.reject)
3580+
yatchew_rej = yatchew_ok and bool(yatchew.reject)
3581+
3582+
reasons = []
3583+
if stute_rej:
3584+
reasons.append("linearity rejected - heterogeneity bias (Stute)")
3585+
if yatchew_rej:
3586+
reasons.append("linearity rejected - heterogeneity bias (Yatchew)")
3587+
3588+
unresolved = []
3589+
if not stute_ok:
3590+
unresolved.append("Stute NaN")
3591+
if not yatchew_ok:
3592+
unresolved.append("Yatchew NaN")
3593+
3594+
if reasons:
3595+
verdict = "; ".join(reasons)
3596+
if unresolved:
3597+
verdict += "; additional steps unresolved: " + "; ".join(unresolved)
3598+
return verdict + _QUG_DEFERRED_SUFFIX
3599+
3600+
# No rejections.
3601+
if not (stute_ok or yatchew_ok):
3602+
return "inconclusive - both Stute and Yatchew linearity tests NaN" + _QUG_DEFERRED_SUFFIX
3603+
3604+
# At least one linearity test conclusive AND no rejection.
3605+
skipped_note = ""
3606+
if not stute_ok:
3607+
skipped_note = " (Stute NaN - skipped)"
3608+
elif not yatchew_ok:
3609+
skipped_note = " (Yatchew NaN - skipped)"
3610+
return (
3611+
"Stute and Yatchew linearity diagnostics fail-to-reject"
3612+
+ skipped_note
3613+
+ _QUG_DEFERRED_SUFFIX
3614+
)
3615+
3616+
3617+
def _compose_verdict_event_study_survey(
3618+
pretrends_joint: Optional[StuteJointResult],
3619+
homogeneity_joint: Optional[StuteJointResult],
3620+
) -> str:
3621+
"""Event-study survey-path verdict (R7 P1 fix; mirrors
3622+
:func:`_compose_verdict_event_study` minus QUG)."""
3623+
pretrends_ok = pretrends_joint is not None and bool(np.isfinite(pretrends_joint.p_value))
3624+
homogeneity_ok = homogeneity_joint is not None and bool(np.isfinite(homogeneity_joint.p_value))
3625+
pretrends_rej = pretrends_joint is not None and pretrends_ok and bool(pretrends_joint.reject)
3626+
homogeneity_rej = (
3627+
homogeneity_joint is not None and homogeneity_ok and bool(homogeneity_joint.reject)
3628+
)
3629+
3630+
reasons = []
3631+
if pretrends_rej:
3632+
reasons.append("joint pre-trends rejected - assumption 7 violated (joint Stute)")
3633+
if homogeneity_rej:
3634+
reasons.append("joint linearity rejected - heterogeneity bias (joint Stute)")
3635+
3636+
unresolved = []
3637+
if pretrends_joint is None:
3638+
unresolved.append("joint pre-trends skipped (no earlier pre-period)")
3639+
elif not pretrends_ok:
3640+
unresolved.append("joint pre-trends NaN")
3641+
if homogeneity_joint is None:
3642+
unresolved.append("joint linearity skipped")
3643+
elif not homogeneity_ok:
3644+
unresolved.append("joint linearity NaN")
3645+
3646+
if reasons:
3647+
verdict = "; ".join(reasons)
3648+
if unresolved:
3649+
verdict += "; additional steps unresolved: " + "; ".join(unresolved)
3650+
return verdict + _QUG_DEFERRED_SUFFIX
3651+
3652+
if unresolved:
3653+
return "inconclusive - " + "; ".join(unresolved) + _QUG_DEFERRED_SUFFIX
3654+
3655+
return "joint pre-trends and joint linearity diagnostics fail-to-reject" + _QUG_DEFERRED_SUFFIX
3656+
35373657

35383658
def did_had_pretest_workflow(
35393659
data: pd.DataFrame,
@@ -3820,33 +3940,11 @@ def did_had_pretest_workflow(
38203940
and homogeneity_ok
38213941
and not homogeneity_joint.reject
38223942
)
3823-
# Reuse the unweighted verdict composer with a synthetic NaN
3824-
# qug (all_finite=False, reject=False) so the existing logic
3825-
# produces a "linearity-conditional" verdict. Then append the
3826-
# explicit Phase 4.5 C0 suffix per Reviewer LOW #2.
3827-
qug_skip = QUGTestResults(
3828-
t_stat=float("nan"),
3829-
p_value=float("nan"),
3830-
reject=False,
3831-
alpha=alpha,
3832-
critical_value=float("nan"),
3833-
n_obs=int(doses_at_F.shape[0]),
3834-
n_excluded_zero=0,
3835-
d_order_1=float("nan"),
3836-
d_order_2=float("nan"),
3837-
)
3838-
base_verdict = _compose_verdict_event_study(
3839-
qug_skip, pretrends_joint, homogeneity_joint
3840-
)
3841-
# Strip the "QUG NaN" mention from the unresolved-steps suffix
3842-
# since users get a more informative QUG-skip warning + suffix.
3843-
base_verdict = base_verdict.replace(
3844-
"; additional steps unresolved: QUG NaN", ""
3845-
).replace("inconclusive - QUG NaN", "inconclusive")
3846-
verdict = (
3847-
base_verdict + " (linearity-conditional verdict; QUG-under-survey "
3848-
"deferred per Phase 4.5 C0)"
3849-
)
3943+
# R7 P1 fix: explicit survey-aware verdict composer instead
3944+
# of post-processing the unweighted-verdict output (the
3945+
# previous string-replace approach could leave pass cases
3946+
# starting with "inconclusive" even when all_pass=True).
3947+
verdict = _compose_verdict_event_study_survey(pretrends_joint, homogeneity_joint)
38503948
else:
38513949
qug_ok = bool(np.isfinite(qug_res.p_value))
38523950
all_pass = bool(
@@ -3934,28 +4032,8 @@ def did_had_pretest_workflow(
39344032
if use_survey_path:
39354033
any_reject = stute_res.reject or yatchew_res.reject
39364034
all_pass = bool(linearity_conclusive and not any_reject)
3937-
# Compose verdict from existing _compose_verdict but omit QUG;
3938-
# synthesize a NaN QUG so existing logic produces "QUG NaN" suffix,
3939-
# then strip that and append Phase 4.5 C0 suffix.
3940-
qug_skip = QUGTestResults(
3941-
t_stat=float("nan"),
3942-
p_value=float("nan"),
3943-
reject=False,
3944-
alpha=alpha,
3945-
critical_value=float("nan"),
3946-
n_obs=int(d_arr.shape[0]),
3947-
n_excluded_zero=0,
3948-
d_order_1=float("nan"),
3949-
d_order_2=float("nan"),
3950-
)
3951-
base_verdict = _compose_verdict(qug_skip, stute_res, yatchew_res)
3952-
base_verdict = base_verdict.replace("; additional steps unresolved: QUG NaN", "").replace(
3953-
"inconclusive - QUG NaN", "inconclusive"
3954-
)
3955-
verdict = (
3956-
base_verdict + " (linearity-conditional verdict; QUG-under-survey "
3957-
"deferred per Phase 4.5 C0)"
3958-
)
4035+
# R7 P1 fix: explicit survey-aware verdict composer.
4036+
verdict = _compose_verdict_overall_survey(stute_res, yatchew_res)
39594037
else:
39604038
qug_conclusive = bool(np.isfinite(qug_res.p_value))
39614039
any_reject = qug_res.reject or stute_res.reject or yatchew_res.reject

tests/test_had_pretests.py

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4014,3 +4014,88 @@ def test_workflow_event_study_zero_weights_on_dropped_cohort(self):
40144014
assert report.aggregate == "event_study"
40154015
assert report.qug is None
40164016
assert report.homogeneity_joint is not None
4017+
4018+
# --- R7 P0: weighted-CvM outer-measure oracle -------------------------
4019+
4020+
def test_cvm_statistic_weighted_outer_measure_oracle(self):
4021+
"""R7 P0: weighted CvM must integrate outer measure against F_hat_w
4022+
too. Hand-computed oracle distinguishes outer-weighted form
4023+
((1/W^2) sum_g w_g C_g^2) from count-weighted-cusum form
4024+
((1/W^2) sum_g C_g^2). Uniform weights cannot tell the two apart."""
4025+
from diff_diff.had_pretests import _cvm_statistic_weighted
4026+
4027+
eps = np.array([1.0, -2.0, 3.0])
4028+
d = np.array([0.1, 0.2, 0.3])
4029+
w = np.array([1.0, 2.0, 3.0])
4030+
# C_1=1, C_2=-3, C_3=6, W=6.
4031+
# Outer-weighted: (1*1 + 2*9 + 3*36) / 36 = 127/36.
4032+
# Count-weighted (WRONG): (1+9+36) / 36 = 46/36.
4033+
result = _cvm_statistic_weighted(eps, d, w)
4034+
outer_weighted = (1 * 1.0**2 + 2 * (-3.0) ** 2 + 3 * 6.0**2) / (6.0**2)
4035+
count_weighted = (1.0**2 + (-3.0) ** 2 + 6.0**2) / (6.0**2)
4036+
np.testing.assert_allclose(result, outer_weighted, atol=1e-14, rtol=1e-14)
4037+
assert abs(outer_weighted - count_weighted) > 1.0
4038+
assert abs(result - count_weighted) > 1.0
4039+
4040+
def test_cvm_statistic_weighted_reduces_at_uniform_weights(self):
4041+
"""At w=ones(G), outer-weighted form reduces bit-exactly to the
4042+
unweighted statistic."""
4043+
from diff_diff.had_pretests import _cvm_statistic, _cvm_statistic_weighted
4044+
4045+
eps = np.array([1.0, -2.0, 3.0, 0.5, -0.7])
4046+
d = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
4047+
w_uniform = np.ones(5)
4048+
np.testing.assert_allclose(
4049+
_cvm_statistic_weighted(eps, d, w_uniform),
4050+
_cvm_statistic(eps, d),
4051+
atol=1e-14,
4052+
rtol=1e-14,
4053+
)
4054+
4055+
# --- R7 P1: survey verdict consistency --------------------------------
4056+
4057+
def test_workflow_overall_survey_pass_does_not_say_inconclusive(self):
4058+
"""R7 P1: when all_pass=True on the overall survey path, the
4059+
verdict must NOT start with 'inconclusive'. Locks the explicit
4060+
survey-aware verdict composer."""
4061+
df = self._make_overall_panel()
4062+
weights_per_row = np.full(40, 1.5)
4063+
with pytest.warns(UserWarning):
4064+
report = did_had_pretest_workflow(
4065+
df,
4066+
"y",
4067+
"d",
4068+
"time",
4069+
"unit",
4070+
weights=weights_per_row,
4071+
n_bootstrap=199,
4072+
seed=0,
4073+
)
4074+
if report.all_pass:
4075+
assert not report.verdict.startswith("inconclusive"), (
4076+
f"all_pass=True but verdict starts with 'inconclusive': " f"{report.verdict!r}"
4077+
)
4078+
4079+
def test_workflow_event_study_survey_pass_does_not_say_inconclusive(self):
4080+
"""R7 P1: same invariant on the event-study survey path."""
4081+
from diff_diff import SurveyDesign
4082+
4083+
df = self._make_event_study_panel_with_psu_strata(
4084+
n_strata=2, n_psu_per_stratum=3, n_units_per_psu=2
4085+
)
4086+
with pytest.warns(UserWarning, match="QUG step skipped"):
4087+
report = did_had_pretest_workflow(
4088+
df,
4089+
"y",
4090+
"d",
4091+
"time",
4092+
"unit",
4093+
aggregate="event_study",
4094+
survey=SurveyDesign(weights="w", strata="stratum", psu="psu"),
4095+
n_bootstrap=199,
4096+
seed=0,
4097+
)
4098+
if report.all_pass:
4099+
assert not report.verdict.startswith("inconclusive"), (
4100+
f"all_pass=True but verdict starts with 'inconclusive': " f"{report.verdict!r}"
4101+
)

0 commit comments

Comments
 (0)