Skip to content

Commit 3efa276

Browse files
igerberclaude
andcommitted
Rank-based replicate df, mse=False default, update docs
- Replicate df now uses numerical rank of replicate weight matrix (matching R's survey::degf()) instead of hardcoded R-1 - Default mse=False (matching R's survey::svrepdesign() package default) - Update REGISTRY.md to reflect both changes - WLS identifiability for zero-weight subpopulation already handled by existing zero-mass guards in ContinuousDiD/TripleDifference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent aefa7f3 commit 3efa276

2 files changed

Lines changed: 18 additions & 7 deletions

File tree

diff_diff/survey.py

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ class SurveyDesign:
6969
combined_weights: bool = True
7070
replicate_scale: Optional[float] = None
7171
replicate_rscales: Optional[List[float]] = None
72-
mse: bool = True
72+
mse: bool = False
7373

7474
def __post_init__(self):
7575
valid_weight_types = {"pweight", "fweight", "aweight"}
@@ -557,7 +557,7 @@ class ResolvedSurveyDesign:
557557
combined_weights: bool = True
558558
replicate_scale: Optional[float] = None
559559
replicate_rscales: Optional[np.ndarray] = None # (R,) per-replicate scales
560-
mse: bool = True
560+
mse: bool = False
561561

562562
@property
563563
def uses_replicate_variance(self) -> bool:
@@ -566,9 +566,18 @@ def uses_replicate_variance(self) -> bool:
566566

567567
@property
568568
def df_survey(self) -> Optional[int]:
569-
"""Survey degrees of freedom: n_PSU - n_strata, or R-1 for replicates."""
569+
"""Survey degrees of freedom.
570+
571+
For replicate designs: numerical rank of centered replicate weight
572+
matrix, matching R's ``survey::degf()``. For TSL: n_PSU - n_strata.
573+
"""
570574
if self.uses_replicate_variance:
571-
return self.n_replicates - 1 if self.n_replicates > 1 else None
575+
if self.replicate_weights is None or self.n_replicates < 2:
576+
return None
577+
# Rank-based df from replicate weight matrix, matching
578+
# R's survey::degf() for svrepdesign objects
579+
rank = int(np.linalg.matrix_rank(self.replicate_weights))
580+
return max(rank - 1, 1) if rank > 1 else None
572581
if self.psu is not None and self.n_psu > 0:
573582
if self.strata is not None and self.n_strata > 0:
574583
return self.n_psu - self.n_strata

docs/methodology/REGISTRY.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2011,7 +2011,8 @@ variance from the distribution of replicate estimates.
20112011
contrasts are formed via weight-ratio rescaling:
20122012
`theta_r = sum((w_r/w_full) * psi)` when `combined_weights=True`,
20132013
`theta_r = sum(w_r * psi)` when `combined_weights=False`.
2014-
- **Survey df**: `R - 1` for replicate designs (replaces `n_PSU - n_strata`)
2014+
- **Survey df**: Numerical rank of replicate weight matrix minus 1,
2015+
matching R's `survey::degf()`. Replaces `n_PSU - n_strata`.
20152016
- **Mutual exclusion**: Replicate weights cannot be combined with
20162017
strata/psu/fpc (the replicates encode design structure implicitly)
20172018
- **Design parameters** (matching R `svrepdesign()`):
@@ -2020,8 +2021,9 @@ variance from the distribution of replicate estimates.
20202021
by full-sample weight before WLS.
20212022
- `replicate_scale`: override default variance scaling factor
20222023
- `replicate_rscales`: per-replicate scaling factors (vector of length R)
2023-
- `mse` (default True): center variance on full-sample estimate. If False,
2024-
center on mean of replicate estimates.
2024+
- `mse` (default False, matching R's `survey::svrepdesign()`): if True,
2025+
center variance on full-sample estimate; if False, center on mean of
2026+
replicate estimates.
20252027
- **Note:** Replicate columns are NOT normalized — raw values are preserved
20262028
to maintain correct weight ratios in the IF path.
20272029
- **Note:** JKn requires explicit `replicate_strata` (per-replicate stratum

0 commit comments

Comments
 (0)