Phase 3: Survey design support for OLS-based standalone estimators#226
Conversation
|
Overall Assessment I cross-checked the changed survey paths against the Methodology Registry and the estimator implementations. The documented Phase 5 deferrals and documented Bacon diagnostic deviation are fine; the blockers below are all undocumented survey-path mismatches or missing assumption checks. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
- EfficientDiD: collapse survey to unit-level before TSL (was expanding EIF to panel rows, making SEs too small for weights-only path) - ContinuousDiD: survey-weighted event-study aggregation and TSL SEs - ContinuousDiD/EfficientDiD: validate within-unit constancy for survey columns before collapsing to unit level - TripleDifference: use survey df (n_PSU - n_strata) for p-values/CIs instead of sample df (n_obs - 8) - StackedDiD: handle SurveyDesign without weights column (use ones) - Pass survey df to safe_inference() in all survey-aware estimators Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality Performance Maintainability Tech Debt Security Documentation/Tests
Path to Approval
|
- ContinuousDiD: rescale IFs by n_units before compute_survey_vcov to avoid double-counting 1/n bread; use unit-level df_survey - EfficientDiD: align unit_first_panel_row to sorted all_units order; build unit-level ResolvedSurveyDesign once in fit(); use unit-level df - SunAbraham: thread survey weights into _compute_iw_effects and _compute_overall_att for survey-weighted cohort aggregation - StackedDiD: pass survey df to safe_inference for event-study and overall ATT p-values/CIs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality Performance Maintainability Tech Debt Security Documentation/Tests
Path to Approval
|
…nce from PR #226 review (round 3) - ContinuousDiD: store survey-weighted treated/control masses and weighted dpsi_bar in bootstrap_info; use weighted masses for p_1, p_0, n_total in IF construction so TSL linearizes the weighted estimator - TripleDifference: use survey-weighted subgroup mass (sum(w_sub)) instead of raw counts (n_sub) for pairwise IF combination weights w3, w2, w1 when survey design is active Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
… from PR #226 review (round 4) - ContinuousDiD: normalize WLS bread by weighted treated mass (not raw count) for consistency with downstream IF score denominators; fixes ACRT_glob/ATT(d)/ACRT(d) survey SEs when subgroup-average weights differ - ContinuousDiD/EfficientDiD: recompute survey_metadata from unit-level ResolvedSurveyDesign so reported effective_n/n_psu/df_survey match the inference actually run (not the panel-level overcount) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…ety from PR #226 review (round 5) - EfficientDiD: use resolved survey weights directly for unit-level estimation (Omega*, EIF, cohort fractions) instead of separately renormalized raw column, ensuring fweight/aweight consistency with TSL - BaconDecomposition: store survey weights as DataFrame column for safe label-based subsetting in _recompute_exact_weights, preventing out-of-bounds errors on non-default DataFrame indexes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology No unmitigated registry/source-material mismatch stood out in
Code Quality
Performance No material findings in the reviewed diff. Maintainability No material findings beyond the blockers above. Tech Debt
Security No security or secret-handling issues found in the reviewed diff. Documentation/Tests
Path to Approval
|
…ll guard from PR #226 review (round 6) - ContinuousDiD: rescale IFs by unit-level total survey mass (unit_resolved.weights.sum()) instead of hard-coded n_units, so TSL SEs are correct for fweight designs where mass != n_units - BaconDecomposition: add empty-cell guard in _compute_treated_vs_never before np.average() to prevent crashes on unbalanced/filtered panels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No material findings beyond the inference issues above. Performance No material findings in the reviewed diff. Maintainability No material findings in the reviewed diff. Tech Debt
Security No security or secret-handling issues found in the reviewed diff. Documentation/Tests
Path to Approval
|
…(round 7) Thread survey_metadata.df_survey to safe_inference() for both overall ATT and event-study effects, so p-values/CIs use t-distribution with n_PSU - n_strata df when survey design is active. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…-values from PR #226 review (round 8) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentExecutive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
- EfficientDiD: collapse survey to unit-level before TSL (was expanding EIF to panel rows, making SEs too small for weights-only path) - ContinuousDiD: survey-weighted event-study aggregation and TSL SEs - ContinuousDiD/EfficientDiD: validate within-unit constancy for survey columns before collapsing to unit level - TripleDifference: use survey df (n_PSU - n_strata) for p-values/CIs instead of sample df (n_obs - 8) - StackedDiD: handle SurveyDesign without weights column (use ones) - Pass survey df to safe_inference() in all survey-aware estimators Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ContinuousDiD: rescale IFs by n_units before compute_survey_vcov to avoid double-counting 1/n bread; use unit-level df_survey - EfficientDiD: align unit_first_panel_row to sorted all_units order; build unit-level ResolvedSurveyDesign once in fit(); use unit-level df - SunAbraham: thread survey weights into _compute_iw_effects and _compute_overall_att for survey-weighted cohort aggregation - StackedDiD: pass survey df to safe_inference for event-study and overall ATT p-values/CIs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nce from PR #226 review (round 3) - ContinuousDiD: store survey-weighted treated/control masses and weighted dpsi_bar in bootstrap_info; use weighted masses for p_1, p_0, n_total in IF construction so TSL linearizes the weighted estimator - TripleDifference: use survey-weighted subgroup mass (sum(w_sub)) instead of raw counts (n_sub) for pairwise IF combination weights w3, w2, w1 when survey design is active Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… from PR #226 review (round 4) - ContinuousDiD: normalize WLS bread by weighted treated mass (not raw count) for consistency with downstream IF score denominators; fixes ACRT_glob/ATT(d)/ACRT(d) survey SEs when subgroup-average weights differ - ContinuousDiD/EfficientDiD: recompute survey_metadata from unit-level ResolvedSurveyDesign so reported effective_n/n_psu/df_survey match the inference actually run (not the panel-level overcount) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ety from PR #226 review (round 5) - EfficientDiD: use resolved survey weights directly for unit-level estimation (Omega*, EIF, cohort fractions) instead of separately renormalized raw column, ensuring fweight/aweight consistency with TSL - BaconDecomposition: store survey weights as DataFrame column for safe label-based subsetting in _recompute_exact_weights, preventing out-of-bounds errors on non-default DataFrame indexes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
#226 review (round 13) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality Performance Maintainability Tech Debt Security Documentation/Tests
Path to Approval
|
…n test from PR #226 review (round 14) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…m PR #226 local review Pre-existing behavior (not introduced by survey PR): event-study aggregation uses all (g,t) cells without anticipation filtering. Affects both survey and non-survey paths equally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ocal review Maintainability P2: panel-to-unit survey collapse, post-filter re-resolution, and metadata recomputation patterns differ across ContinuousDiD, EfficientDiD, and StackedDiD. Candidate for shared helpers in a follow-up tech debt PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l review P3: near-identical survey metadata rendering in 7 results classes will grow worse as Phases 4-5 add more estimators. Extract a shared helper in a follow-up tech debt PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance No findings. Maintainability
Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
…) convenience functions from PR #226 review (round 15) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality No findings. The prior public-wrapper passthrough issue appears fixed in diff_diff/bacon.py:L1123 and diff_diff/triple_diff.py:L1780. Performance No findings. Maintainability
Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
…226 review (round 16) Remove max(n_at_e, 1) floor from survey-weighted branch — valid survey masses can be < 1 and flooring breaks scale invariance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability
Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
…226 review (round 17) Q-weight composition produces non-integer composed weights, breaking frequency-weight semantics. Raise ValueError for weight_type='fweight'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only; I did not execute the test suite in this environment. Executive Summary
Methodology
Code Quality Performance Maintainability
Tech Debt
Security Documentation/Tests
Path to Approval
|
…hts in score terms from PR #226 review (round 18) The pre/post OLS score pieces (wols_eX_pre/post) and design matrices (wols_x_pre/post) now include survey weights when present, consistent with the weighted OLS bread. Previously only the bread was weighted, making the survey IF inconsistent with the weighted OLS fit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated Executive Summary
Methodology No unmitigated
Code Quality No findings. Performance No findings. Maintainability No findings beyond the non-blocking items already tracked in Tech Debt
Security No findings. Documentation/Tests
|
…usions from PR #226 review (round 19) - BaconDecomposition: move _validate_unit_constant_survey behind weights="exact" — approximate path uses obs-level means and doesn't need within-unit-constant survey columns - REGISTRY.md: document StackedDiD fweight rejection and Bacon exact-only unit-constancy requirement - survey-roadmap.md: update StackedDiD support to "Full (pweight/aweight)" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static re-review only. I could not run the test suite in this sandbox because Executive Summary
Methodology Affected method: Resolved from the prior review: the Bacon approximate survey path no longer rejects time-varying survey columns, and that exact-only constraint is now documented in diff_diff/bacon.py:L481-L488 and docs/methodology/REGISTRY.md:L1578-L1579.
Code Quality No findings. Performance No findings. Maintainability No new findings beyond the already tracked survey-helper cleanup in TODO.md:L58-L59. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
…om PR #226 review (round 20) Q-weight composition changes inverse-variance weight semantics just as it breaks frequency-weight semantics. Update guard, docs, and tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Static re-review only. I could not execute the test suite in this sandbox because Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
survey_designparameter tofit()for 6 OLS-based standalone estimators: StackedDiD, SunAbraham, BaconDecomposition, TripleDifference (reg method), ContinuousDiD, and EfficientDiDsurvey_metadatafield to all 6 results classes with summary() and to_dict() supportNotImplementedErrorguards (pending Phase 5: weightedsolve_logit())Estimator-specific details
Methodology references (required if estimator / math changes)
Validation
tests/test_survey_phase3.py(31 new tests covering smoke, uniform-weight invariance, scale invariance, metadata, summary output, NotImplementedError guards)Security / privacy
Generated with Claude Code