Add survey R cross-validation: TSL variance vs R survey::svyglm#250
Conversation
…yglm Cross-validates diff-diff's survey variance estimates against R's authoritative survey package (Lumley 2004) across three tiers: Tier 1: DifferenceInDifferences vs svyglm under 4 design variants (strata+PSU+FPC, strata+PSU, weights-only, strata-only), with covariates, and across 3 seeds for robustness — 29 tests Tier 2: CallawaySantAnna vs did::att_gt with survey weights, with and without covariates — 8 tests Tier 3: BRR replicate weights via LinearRegression vs svrepdesign — 3 tests All 43 tests pass with tight tolerances (ATT rtol=1e-4, SE rtol=1%). No deviations from R found. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good Executive Summary
Methodology Cross-check summary: the benchmark setup is consistent with the library’s documented contracts for never-treated encoding (
Code Quality
Performance No findings. Maintainability
Tech Debt No new untracked blocker found. Related survey items are already marked resolved in TODO.md:L57-L59. Security No findings. The added fixture appears synthetic, and I did not see secret-like material. Documentation/Tests
Verification note: I could not run |
- P2: Add docstring explaining why Tier 2 compares ATT only (R's did::att_gt gives naive SEs, not design-based) - P3: Fail loudly on unexpected non-numeric first_treat values instead of silently mapping to 0 - P3: Sort R gt_results by (group, time) and use keyed comparison in Python for stable cross-version matching - P3: Relabel misleading "SE(naive)" to "SE" in R console output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
survey::svyglm()(Lumley 2004) across three tiersDifferenceInDifferencesvssvyglmunder 4 design variants (strata+PSU+FPC, strata+PSU, weights-only, strata-only), with covariates, across 3 seeds — 29 testsCallawaySantAnnavsdid::att_gtwith survey weights, with and without covariates — 8 testsLinearRegressionvssvrepdesign— 3 testsMethodology references (required if estimator / math changes)
Validation
tests/test_survey_r_crossvalidation.py(43 tests, 3 tiers)benchmarks/R/benchmark_survey_crossvalidation.Rbenchmarks/data/synthetic/survey_crossvalidation_r_results.jsonSecurity / privacy
Generated with Claude Code