Add survey design support to StaggeredTripleDifference#247
Conversation
Thread survey weights through all three pairwise DiD comparisons (propensity scores, outcome regression, Riesz representers) with design-based variance at aggregation via CallawaySantAnna mixin infrastructure. Extract collapse_survey_to_unit_level to survey.py for reuse. Full test coverage across estimation methods, survey designs, and aggregation modes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…erage Block unsupported replicate-weight + n_bootstrap>0 combination matching CallawaySantAnna guard. Propagate _effective_df from _aggregate_simple() to df_survey for correct replicate-weight inference. Add tests for replicate+bootstrap rejection and survey-weighted aggregation point estimates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Normalize survey_weights_arr to sum=n in _precompute_structures() so size_gt/size_gt_ctrl denominators are scale-invariant for replicate designs. Fix BRR test fixtures to build combined replicate weights (rep_r = weight * factor) honoring combined_weights=True semantics. Add replicate scale-invariance tests for simple/event_study/group aggregation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology Cross-check against source material is mostly clean: the documented survey-specific deviations now recorded in the registry are not defects, including the survey-weighted cell-size extension and the already-documented aggregation-weight differences (docs/methodology/REGISTRY.md:L1349-L1357, docs/methodology/REGISTRY.md:L1393-L1404).
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No new findings. The existing SDDD parity limitations remain tracked already and are not blockers (TODO.md:L68-L70). Security No findings. Documentation/Tests
Execution note: I could not run the test suite in this environment because Path to Approval
|
Fix survey-weighted propensity fallback to use np.average(PA4, weights=sw) instead of unweighted np.mean(PA4). Add zero-weight subgroup detection in _compute_ddd_gt_gc for subpopulation/domain designs. Fix existing test expecting NotImplementedError for invalid survey_design type (now TypeError from _resolve_survey_for_fit). Add covariate-adjusted survey tests for reg/ipw/dr and combined_weights=False replicate test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No new findings. The pre-existing staggered-DDD parity limitations remain tracked in Security No findings. Documentation/Tests
Execution note: I could not run |
…b-aggregation invariance Pin the two newly fixed branches: weighted pscore fallback (forced via collinear covariate) and zero-mass subgroup warning/skip (via zeroed weights). Extend replicate scale-invariance test to assert event_study and group_effects values directly, not just overall_att/overall_se. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
StaggeredTripleDifferenceestimatorcollapse_survey_to_unit_level()fromCallawaySantAnnatosurvey.pyfor reuse across panel IF-based estimatorsCallawaySantAnnaAggregationMixininfrastructure (TSL or replicate IF variance)Methodology references (required if estimator / math changes)
triplediff::agg_ddd()documented in REGISTRY.md (uses CallawaySantAnna mixin cohort-size weights instead of group-probability weights). Rtriplediffpackage does not support survey weights — this implementation is unique to diff-diff.Validation
tests/test_survey_staggered_ddd.py(25 tests across 12 classes)test_methodology_staggered_triple_diff.py(40 tests),test_survey_phase4.pyCallawaySantAnna tests (23 tests)Security / privacy
Generated with Claude Code