Address CI review round 2: reject non-binary masks, fix logit test, document df

igerber · claude · igerber · commit a8b3e4c6ab68 · 2026-03-27T10:46:01.000-04:00
CI review findings:
- Reject non-binary numeric masks in subpopulation() ({1,2} etc. coerce
  to all-True via astype(bool), silently defining wrong domain)
- Fix test_survey_phase4.py: update "strictly positive" to "non-negative"
  to match changed solve_logit() validation message
- Document replicate df limitation in TODO.md (df stays R-1 when invalid
  replicates are dropped — marginal impact for typical R &gt; 50)
- Add REGISTRY.md Note entries for replicate &lt;2 valid returns NaN
- Tests: non-binary numeric mask rejection, beta length assertion

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/TODO.md b/TODO.md
@@ -52,6 +52,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails; fixing requires sparse least-squares alternatives) |
 | EfficientDiD: API docs / tutorial page for new public estimator | `docs/` | #192 | Medium |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
+| Replicate-weight survey df not updated when invalid replicates are dropped — `df_survey` remains `R-1` from original design instead of `n_valid-1`. Impact is marginal (df difference < 1% for typical R > 50). Conservative direction only when many replicates fail. | `survey.py` | #238 | Low |
 | CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved |
 | CallawaySantAnna survey + covariates + IPW/DR: DRDID panel nuisance-estimation IF corrections not implemented. Currently gated with NotImplementedError. Regression method with covariates works (has WLS nuisance IF correction). | `staggered.py` | #233 | Medium |
 | SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` | — | Resolved |
diff --git a/diff_diff/survey.py b/diff_diff/survey.py
@@ -444,6 +444,15 @@ def subpopulation(
                 "Subpopulation mask contains string values. "
                 "Provide a boolean or numeric (0/1) mask."
             )
+        # Validate numeric masks: only {0, 1} allowed (not {1, 2}, etc.)
+        if hasattr(raw_mask, 'dtype') and raw_mask.dtype.kind in ('i', 'u', 'f'):
+            unique_vals = set(np.unique(raw_mask[np.isfinite(raw_mask)]).tolist())
+            if not unique_vals.issubset({0, 1, 0.0, 1.0, True, False}):
+                raise ValueError(
+                    f"Subpopulation mask contains non-binary numeric values "
+                    f"{unique_vals - {0, 1, 0.0, 1.0}}. "
+                    f"Provide a boolean or numeric (0/1) mask."
+                )
         mask_arr = raw_mask.astype(bool)
 
         if len(mask_arr) != len(data):
diff --git a/tests/test_survey_phase4.py b/tests/test_survey_phase4.py
@@ -202,7 +202,7 @@ def test_negative_weights_raises(self):
         y = (X @ [0.5, -0.5] + rng.randn(n) > 0).astype(float)
         weights = np.ones(n)
         weights[0] = -1.0
-        with pytest.raises(ValueError, match="strictly positive"):
+        with pytest.raises(ValueError, match="non-negative"):
             solve_logit(X, y, weights=weights)
 
     def test_wrong_shape_weights_raises(self):
diff --git a/tests/test_survey_phase6.py b/tests/test_survey_phase6.py
@@ -759,6 +759,15 @@ def test_subpopulation_string_mask_rejected(self, basic_did_data):
         with pytest.raises(ValueError, match="string"):
             sd.subpopulation(basic_did_data, mask)
 
+    def test_nonbinary_numeric_mask_rejected(self, basic_did_data):
+        """Subpopulation mask with non-binary numeric codes should be rejected."""
+        sd = SurveyDesign(weights="weight")
+        # Coded domain column {1, 2} — not boolean, should fail
+        mask = np.ones(len(basic_did_data), dtype=int)
+        mask[:10] = 2
+        with pytest.raises(ValueError, match="non-binary"):
+            sd.subpopulation(basic_did_data, mask)
+
     def test_replicate_if_no_divide_by_zero_warning(self):
         """compute_replicate_if_variance should not warn on zero weights."""
         from diff_diff.survey import compute_replicate_if_variance, ResolvedSurveyDesign