Add survey-aware DiD tutorial (Phase 7c)#249
Conversation
Add docs/tutorials/16_survey_did.ipynb — a 35-cell tutorial framed around a state-level preventive care program evaluated with a stratified health survey (ACS/BRFSS-like). Covers: why survey design matters for DiD, SurveyDesign setup, basic and staggered DiD with survey design, replicate weights (JK1), subpopulation analysis, DEFF diagnostics, repeated cross-sections, and estimator support reference. Also adds generate_survey_did_data() to diff_diff/prep_dgp.py with realistic survey structure (strata, PSUs, FPC, sampling weights, optional JK1 replicate weights) and 12 unit tests. Marks Phase 7c complete in the survey roadmap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Overall Assessment Executive Summary
Methodology Methods affected in this PR are
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
- Fix estimator support table: ImputationDiD and TwoStageDiD now show "Partial (no FPC)" instead of "Full" for strata/PSU/FPC support, matching the code which explicitly rejects FPC (P1) - Fix repeated cross-section example: add panel=False parameter to generate_survey_did_data() that draws fresh respondent effects each period instead of relabeling panel unit IDs (P1) - Add JK1 minimum-PSU guard: raise ValueError when n_psu < 2 to prevent division by zero in replicate weight generation (P2) - Clear stale notebook outputs committed from wrong environment (P2) - Add top-level import test and JK1 boundary test (P2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance No material findings in PR scope. Maintainability No material findings beyond the table/legend ambiguity called out in Methodology. Tech Debt No material findings. I did not find matching Security No findings. Documentation/Tests No additional material findings beyond the coverage gap noted in Code Quality. The earlier stale-output/import concern appears addressed by clearing notebook outputs and adding a top-level import test at tests/test_prep.py#L1296. I could not run the notebook or test suite in this workspace because Path to Approval
|
- Fix StackedDiD support table: show "Full (pweight only)" for strata/PSU/FPC since it supports full TSL on composed weights, just restricted to pweight weight type (P1) - Fix replicate weight section: remove misleading TSL vs JK1 equivalence claim; show JK1 as standalone API demo and note that stratified replicates (JKn) are needed for stratified designs (P1) - Add input validation for weight_variation and cohort_periods in generate_survey_did_data() with negative tests (P2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology Prior methodology blockers from the last review look resolved.
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Mirror the existing generate_staggered_data() guard: reject cohort periods that are non-integer, < 1, or >= n_periods. Add negative tests for out-of-range and non-integer inputs. Fix existing tests that used default cohort_periods with small n_periods (now caught by the new validation). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
- CallawaySantAnna, TripleDifference, ImputationDiD, TwoStageDiD: change Weights column from "Full" to "pweight only" (all reject fweight/aweight) - Add EfficientDiD footnote: covariates + survey_design cannot be used simultaneously - Clarify legend for "pweight only" in both Weights and TSL columns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- JK1 replicate guard now checks populated PSU count (not just configured count), preventing division-by-zero when n_units < n_psu_total leaves only 1 PSU with observations (P1) - Add regression test for one-populated-PSU edge case - Narrow notebook warning filter from global ignore to RuntimeWarning only, so UserWarning (e.g. repeated cross-section stationarity assumption) is preserved (P3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…nces - Accept np.integer alongside int in cohort_periods validation, matching the existing generate_staggered_data() behavior (P2) - Add regression test for numpy integer cohort periods - Add tutorial 16 cross-links in README.md, quickstart.rst, and choosing_estimator.rst for discoverability (P3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology No findings. This PR adds a DGP and documentation only; I did not find an estimator-math, weighting, variance, or default-behavior change that conflicts with the Methodology Registry. The survey-support claims I spot-checked against the Registry and estimator docstrings line up aside from the documentation completeness issue noted below. Code Quality
Performance No findings in PR scope. Maintainability No findings in PR scope. Tech Debt No findings in Security No findings. Documentation/Tests
Path to Approval
|
…eDiff to table - Derive default cohort_periods from n_periods (mirroring generate_staggered_data) so n_periods=4/5 work without explicit cohort_periods (P1) - Coerce cohort_periods to list before emptiness check, fixing numpy array direct input (P2) - Add StaggeredTripleDifference row to survey support table (P3) - Add regression tests for small n_periods defaults and numpy array cohort_periods Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology No unmitigated findings. This PR does not change estimator math, weighting formulas, or variance code paths, and the prior survey-support table omission is fixed: the tutorial now includes Code Quality
Performance No findings in PR scope. Maintainability No findings in PR scope. Tech Debt No separate findings. The P1 above is not tracked under TODO.md:L47-L71, so it is not mitigated by existing deferral. Security No findings. Documentation/Tests
Path to Approval
|
…lity - Default cohort_periods now requires g >= 2 so every cohort has at least one pre-treatment period (needed for CallawaySantAnna). For n_periods < 4, raises ValueError instead of producing inestimable cohorts (P1) - Allow g = n_periods (last-period adoption has base period g-1) - Reject g = 1 (no pre-period exists) - Update docstring to reflect derived defaults and g >= 2 contract - Strengthen tests: assert all cohorts have pre-periods, test n_periods=3 rejection, test g=n_periods is valid Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology Estimator code paths are unchanged in this PR scope, and the prior DGP boundary issue is resolved.
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. The P1 above is not mitigated by an existing Security No findings. Documentation/Tests
Path to Approval
|
… plots Both event study plots now use the returned conf_int (which respects survey df t-critical values) instead of hardcoded z=1.96, ensuring plotted intervals match the library's inference output (P1). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Validate n_units, n_periods, n_strata, psu_per_stratum > 0, never_treated_frac in [0, 1], and fpc_per_stratum >= psu_per_stratum before building any records. Prevents malformed output or late failures from invalid inputs (P1). Add negative tests for all boundary cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
docs/tutorials/16_survey_did.ipynb— 35-cell tutorial framed around a state-level preventive care program evaluated with a stratified health survey (ACS/BRFSS-like)generate_survey_did_data()DGP function todiff_diff/prep_dgp.pywith realistic survey structure (strata, PSUs, FPC, weights, optional JK1 replicate weights)docs/survey-roadmap.mdMethodology references (required if estimator / math changes)
Validation
tests/test_prep.py::TestGenerateSurveyDidData(12 tests)jupyter nbconvert --executeSecurity / privacy
Generated with Claude Code