Skip to content

Commit bdfb172

Browse files
igerberclaude
andcommitted
Address P3 documentation drift: REGISTRY formula, docstring key, ranking test
- REGISTRY.md: generalize ICC covariate variance to beta1²/beta2² formula - Docstring: fix stratum_effects → base_stratum_effects key name - Add direct covariate-ranking test: covariates dominate Y(0), verify weight assignment changes with nonzero vs zero covariate effects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 21c25f1 commit bdfb172

3 files changed

Lines changed: 40 additions & 3 deletions

File tree

diff_diff/prep_dgp.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1287,7 +1287,7 @@ def generate_survey_did_data(
12871287
return_true_population_att : bool, default=False
12881288
If True, attaches a diagnostic dict to ``df.attrs["dgp_truth"]``
12891289
with keys: ``population_att`` (weight-weighted average of treated
1290-
true effects), ``deff_kish`` (1 + CV(w)^2), ``stratum_effects``
1290+
true effects), ``deff_kish`` (1 + CV(w)^2), ``base_stratum_effects``
12911291
(base stratum TEs before dynamic/covariate modifiers),
12921292
``icc_realized`` (ANOVA-based
12931293
ICC computed on period-1 data).

docs/methodology/REGISTRY.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2519,8 +2519,10 @@ The 8-step workflow in `docs/llms-practitioner.txt` is adapted from Baker et al.
25192519

25202520
- **Note:** The `icc` parameter calibrates `psu_re_sd` using the full variance
25212521
decomposition `Var(Y) = sigma²_psu * (1 + psu_period_factor²) + sigma²_unit +
2522-
sigma²_noise + sigma²_cov`. When `add_covariates=True`, the covariate variance
2523-
`Var(0.5*x1) + Var(0.3*x2) = 0.2725` is included in the calibration.
2522+
sigma²_noise + sigma²_cov`. When `add_covariates=True`, covariate variance
2523+
`sigma²_cov = beta1² * Var(x1) + beta2² * Var(x2)` is included, where
2524+
`(beta1, beta2)` defaults to `(0.5, 0.3)` but is configurable via
2525+
`covariate_effects`.
25242526
- **Note:** When `informative_sampling=True` and `add_covariates=True`, covariate
25252527
contributions are included in the Y(0) ranking used for weight assignment.
25262528
Covariates are pre-drawn before the ranking step (panel: once before the loop;

tests/test_prep.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1721,6 +1721,41 @@ def test_informative_sampling_with_covariates_cross_section(self):
17211721
assert corr > 0.1
17221722
assert "x1" in df.columns
17231723

1724+
def test_informative_sampling_covariate_ranking_direct(self):
1725+
"""Verify covariates actually affect weight assignment in ranking.
1726+
1727+
Use large covariate effects with tiny unit_fe_sd/psu_re_sd so
1728+
covariates dominate Y(0). Weights with nonzero vs zero covariate
1729+
effects should differ.
1730+
"""
1731+
from diff_diff.prep_dgp import generate_survey_did_data
1732+
1733+
# Covariates dominate: large beta, tiny structural variance
1734+
df_with = generate_survey_did_data(
1735+
n_units=200,
1736+
informative_sampling=True,
1737+
add_covariates=True,
1738+
covariate_effects=(5.0, 0.0),
1739+
unit_fe_sd=0.01,
1740+
psu_re_sd=0.01,
1741+
noise_sd=0.01,
1742+
seed=42,
1743+
)
1744+
df_without = generate_survey_did_data(
1745+
n_units=200,
1746+
informative_sampling=True,
1747+
add_covariates=True,
1748+
covariate_effects=(0.0, 0.0),
1749+
unit_fe_sd=0.01,
1750+
psu_re_sd=0.01,
1751+
noise_sd=0.01,
1752+
seed=42,
1753+
)
1754+
# Weight assignments should differ when covariates dominate ranking
1755+
w_with = df_with[df_with["period"] == 1]["weight"].values
1756+
w_without = df_without[df_without["period"] == 1]["weight"].values
1757+
assert not np.allclose(w_with, w_without, atol=0.01)
1758+
17241759
def test_heterogeneous_te_by_strata(self):
17251760
"""Unweighted mean TE should differ from population ATT."""
17261761
from diff_diff.prep_dgp import generate_survey_did_data

0 commit comments

Comments
 (0)