igerber
diff --git a/‎ROADMAP.md‎
Lines changed: 25 additions & 8 deletions b/‎ROADMAP.md‎
Lines changed: 25 additions & 8 deletions
diff --git a/‎TODO.md‎
Lines changed: 6 additions & 6 deletions b/‎TODO.md‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎diff_diff/honest_did.py‎
Lines changed: 99 additions & 18 deletions b/‎diff_diff/honest_did.py‎
Lines changed: 99 additions & 18 deletions
@@ -8,19 +8,42 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
 
 ## Current Status
 
-diff-diff v2.6.0 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis:
+diff-diff v2.7.5 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — design-based variance estimation (Taylor linearization, replicate weights) integrated across all estimators. No R or Python package offers this combination:
 
 - **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024)
 - **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance
 - **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
 - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
 - **Study design**: Power analysis tools
 - **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs
+- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (see [survey-roadmap.md](docs/survey-roadmap.md))
 - **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks)
 
 ---
 
-## Near-Term Enhancements (v2.7)
+## Near-Term Enhancements (v2.8)
+
+### Survey Phase 7: Completing the Survey Story
+
+Close the remaining gaps for practitioners using major population surveys
+(ACS, CPS, BRFSS, MEPS). See [survey-roadmap.md](docs/survey-roadmap.md) for
+full details.
+
+- **CS Covariates + IPW/DR + Survey** *(High priority)*: Implement DRDID
+  nuisance IF corrections under survey weights. Currently the recommended DR
+  method raises `NotImplementedError` with covariates + survey. This is the
+  most commonly needed path in applied work (Medicaid expansion, minimum wage).
+- **Repeated Cross-Sections** *(High priority)*: `panel=False` support for
+  CallawaySantAnna, enabling analysis of surveys that don't track units over
+  time (BRFSS, ACS annual, CPS monthly). Uses cross-sectional DRDID
+  (Sant'Anna & Zhao 2020, Section 4).
+- **Survey-Aware DiD Tutorial** *(High priority)*: Jupyter notebook
+  demonstrating the full workflow with realistic survey data. diff-diff is
+  the only package (R or Python) with design-based variance for modern DiD
+  — this makes that capability discoverable.
+- **HonestDiD + Survey Variance** *(Medium priority)*: Pass survey vcov
+  (TSL or replicate) into sensitivity analysis instead of cluster-robust vcov,
+  so sensitivity bounds respect the same variance structure as main estimates.
 
 ### Staggered Triple Difference (DDD)
 
@@ -32,12 +55,6 @@ Extend the existing `TripleDifference` estimator to handle staggered adoption se
 
 **Reference**: [Ortiz-Villavicencio & Sant'Anna (2025)](https://arxiv.org/abs/2505.09942). *Working Paper*. R package: `triplediff`.
 
-### Enhanced Visualization
-
-- Synthetic control weight visualization (bar chart of unit weights)
-- Treatment adoption "staircase" plot for staggered designs
-- Interactive plots with plotly backend option
-
 ---
 
 ## Medium-Term Enhancements
 
@@ -54,7 +54,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
 | Replicate-weight survey df — **Resolved**. `df_survey = rank(replicate_weights) - 1` matching R's `survey::degf()`. For IF paths, `n_valid - 1` when dropped replicates reduce effective count. | `survey.py` | #238 | Resolved |
 | CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved |
-| CallawaySantAnna survey + covariates + IPW/DR: DRDID panel nuisance-estimation IF corrections not implemented. Currently gated with NotImplementedError. Regression method with covariates works (has WLS nuisance IF correction). | `staggered.py` | #233 | Medium |
+| CallawaySantAnna survey + covariates + IPW/DR — **Resolved**. DRDID panel nuisance IF corrections (PS + OR) implemented for both survey and non-survey DR paths (Phase 7a). IPW path unblocked. | `staggered.py` | #233 | Resolved |
 | SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` | — | Resolved |
 | EfficientDiD hausman_pretest() clustered covariance stale `n_cl` — **Resolved**. Recompute `n_cl` and remap indices after `row_finite` filtering via `np.unique(return_inverse=True)`. | `efficient_did.py` | #230 | Resolved |
 | EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low |
@@ -163,11 +163,11 @@ Spurious RuntimeWarnings ("divide by zero", "overflow", "invalid value") are emi
 
 Features in R's `did` package that block porting additional tests:
 
-| Feature | R tests blocked | Priority |
-|---------|----------------|----------|
-| Repeated cross-sections (`panel=FALSE`) | ~7 tests in test-att_gt.R + test-user_bug_fixes.R | Medium |
-| Sampling/population weights | 7 tests incl. all JEL replication | Medium |
-| Calendar time aggregation | 1 test in test-att_gt.R | Low |
+| Feature | R tests blocked | Priority | Status |
+|---------|----------------|----------|--------|
+| Repeated cross-sections (`panel=FALSE`) | ~7 tests in test-att_gt.R + test-user_bug_fixes.R | High | **Resolved** — Phase 7b: `panel=False` on CallawaySantAnna |
+| Sampling/population weights | 7 tests incl. all JEL replication | Medium | **Resolved** (Phases 1-6 + 7a: CS IPW/DR + covariates + survey) |
+| Calendar time aggregation | 1 test in test-att_gt.R | Low | |
 
 ---
 
 
@@ -22,11 +22,12 @@
 
 import numpy as np
 import pandas as pd
-from scipy import optimize, stats
+from scipy import optimize
 
 from diff_diff.results import (
     MultiPeriodDiDResults,
 )
+from diff_diff.utils import _get_critical_value
 
 # =============================================================================
 # Delta Restriction Classes
@@ -193,6 +194,9 @@ class HonestDiDResults:
     original_results: Optional[Any] = field(default=None, repr=False)
     # Event study bounds (optional)
     event_study_bounds: Optional[Dict[Any, Dict[str, float]]] = field(default=None, repr=False)
+    # Survey design metadata (Phase 7d)
+    survey_metadata: Optional[Any] = field(default=None, repr=False)
+    df_survey: Optional[int] = field(default=None, repr=False)
 
     def __repr__(self) -> str:
         sig = "" if self.ci_lb <= 0 <= self.ci_ub else "*"
@@ -534,7 +538,7 @@ def plot(
 
 def _extract_event_study_params(
     results: Union[MultiPeriodDiDResults, Any],
-) -> Tuple[np.ndarray, np.ndarray, int, int, List[Any], List[Any]]:
+) -> Tuple[np.ndarray, np.ndarray, int, int, List[Any], List[Any], Optional[int]]:
     """
     Extract event study parameters from results objects.
 
@@ -557,6 +561,8 @@ def _extract_event_study_params(
         Pre-period identifiers.
     post_periods : list
         Post-period identifiers.
+    df_survey : int or None
+        Survey degrees of freedom for t-distribution inference.
     """
     if isinstance(results, MultiPeriodDiDResults):
         # Extract from MultiPeriodDiD
@@ -606,7 +612,20 @@ def _extract_event_study_params(
             # Fallback: diagonal from SEs
             sigma = np.diag(np.array(ses) ** 2)
 
-        return beta_hat, sigma, num_pre_periods, num_post_periods, pre_periods, post_periods
+        # Extract survey df if available
+        df_survey = None
+        if hasattr(results, "survey_metadata") and results.survey_metadata is not None:
+            df_survey = getattr(results.survey_metadata, "df_survey", None)
+
+        return (
+            beta_hat,
+            sigma,
+            num_pre_periods,
+            num_post_periods,
+            pre_periods,
+            post_periods,
+            df_survey,
+        )
 
     else:
         # Try CallawaySantAnnaResults
@@ -641,9 +660,29 @@ def _extract_event_study_params(
                     ses.append(event_effects[t]["se"])
 
                 beta_hat = np.array(effects)
-                sigma = np.diag(np.array(ses) ** 2)
 
-                return (beta_hat, sigma, len(pre_times), len(post_times), pre_times, post_times)
+                # Use full event-study VCV if available (Phase 7d),
+                # otherwise fall back to diagonal from SEs
+                if hasattr(results, "event_study_vcov") and results.event_study_vcov is not None:
+                    # event_study_vcov is indexed by sorted rel_times
+                    sigma = results.event_study_vcov
+                else:
+                    sigma = np.diag(np.array(ses) ** 2)
+
+                # Extract survey df
+                df_survey = None
+                if hasattr(results, "survey_metadata") and results.survey_metadata is not None:
+                    df_survey = getattr(results.survey_metadata, "df_survey", None)
+
+                return (
+                    beta_hat,
+                    sigma,
+                    len(pre_times),
+                    len(post_times),
+                    pre_times,
+                    post_times,
+                    df_survey,
+                )
         except ImportError:
             pass
 
@@ -860,7 +899,13 @@ def _solve_bounds_lp(
     return lb, ub
 
 
-def _compute_flci(lb: float, ub: float, se: float, alpha: float = 0.05) -> Tuple[float, float]:
+def _compute_flci(
+    lb: float,
+    ub: float,
+    se: float,
+    alpha: float = 0.05,
+    df: Optional[int] = None,
+) -> Tuple[float, float]:
     """
     Compute Fixed Length Confidence Interval (FLCI).
 
@@ -877,6 +922,9 @@ def _compute_flci(lb: float, ub: float, se: float, alpha: float = 0.05) -> Tuple
         Standard error of the estimator.
     alpha : float
         Significance level.
+    df : int, optional
+        Degrees of freedom. If provided, uses t-distribution critical value
+        instead of normal (for survey designs with df = n_PSU - n_strata).
 
     Returns
     -------
@@ -895,7 +943,7 @@ def _compute_flci(lb: float, ub: float, se: float, alpha: float = 0.05) -> Tuple
     if not (0 < alpha < 1):
         raise ValueError(f"alpha must be between 0 and 1, got alpha={alpha}")
 
-    z = stats.norm.ppf(1 - alpha / 2)
+    z = _get_critical_value(alpha, df)
     ci_lb = lb - z * se
     ci_ub = ub + z * se
     return ci_lb, ci_ub
@@ -909,6 +957,7 @@ def _compute_clf_ci(
     max_pre_violation: float,
     alpha: float = 0.05,
     n_draws: int = 1000,
+    df: Optional[int] = None,
 ) -> Tuple[float, float, float, float]:
     """
     Compute Conditional Least Favorable (C-LF) confidence interval.
@@ -931,6 +980,8 @@ def _compute_clf_ci(
         Significance level.
     n_draws : int
         Number of Monte Carlo draws for conditional CI.
+    df : int, optional
+        Degrees of freedom for t-distribution critical value.
 
     Returns
     -------
@@ -956,7 +1007,7 @@ def _compute_clf_ci(
     ub = theta + bound
 
     # CI with estimation uncertainty
-    z = stats.norm.ppf(1 - alpha / 2)
+    z = _get_critical_value(alpha, df)
     ci_lb = lb - z * se
     ci_ub = ub + z * se
 
@@ -1086,7 +1137,7 @@ def fit(
         M = M if M is not None else self.M
 
         # Extract event study parameters
-        (beta_hat, sigma, num_pre, num_post, pre_periods, post_periods) = (
+        (beta_hat, sigma, num_pre, num_post, pre_periods, post_periods, df_survey) = (
             _extract_event_study_params(results)
         )
 
@@ -1137,22 +1188,41 @@ def fit(
         # Compute bounds based on method
         if self.method == "smoothness":
             lb, ub, ci_lb, ci_ub = self._compute_smoothness_bounds(
-                beta_post, sigma_post, l_vec, num_pre, num_post, M
+                beta_post, sigma_post, l_vec, num_pre, num_post, M, df=df_survey
             )
             ci_method = "FLCI"
 
         elif self.method == "relative_magnitude":
             lb, ub, ci_lb, ci_ub = self._compute_rm_bounds(
-                beta_post, sigma_post, l_vec, num_pre, num_post, M, pre_periods, results
+                beta_post,
+                sigma_post,
+                l_vec,
+                num_pre,
+                num_post,
+                M,
+                pre_periods,
+                results,
+                df=df_survey,
             )
             ci_method = "C-LF"
 
         else:  # combined
             lb, ub, ci_lb, ci_ub = self._compute_combined_bounds(
-                beta_post, sigma_post, l_vec, num_pre, num_post, M, pre_periods, results
+                beta_post,
+                sigma_post,
+                l_vec,
+                num_pre,
+                num_post,
+                M,
+                pre_periods,
+                results,
+                df=df_survey,
             )
             ci_method = "FLCI"
 
+        # Extract survey_metadata for storage on results
+        survey_metadata = getattr(results, "survey_metadata", None)
+
         return HonestDiDResults(
             lb=lb,
             ub=ub,
@@ -1165,6 +1235,8 @@ def fit(
             alpha=self.alpha,
             ci_method=ci_method,
             original_results=results,
+            survey_metadata=survey_metadata,
+            df_survey=df_survey,
         )
 
     def _compute_smoothness_bounds(
@@ -1175,6 +1247,7 @@ def _compute_smoothness_bounds(
         num_pre: int,
         num_post: int,
         M: float,
+        df: Optional[int] = None,
     ) -> Tuple[float, float, float, float]:
         """Compute bounds under smoothness restriction."""
         # Construct constraints
@@ -1185,7 +1258,7 @@ def _compute_smoothness_bounds(
 
         # Compute FLCI
         se = np.sqrt(l_vec @ sigma_post @ l_vec)
-        ci_lb, ci_ub = _compute_flci(lb, ub, se, self.alpha)
+        ci_lb, ci_ub = _compute_flci(lb, ub, se, self.alpha, df=df)
 
         return lb, ub, ci_lb, ci_ub
 
@@ -1199,6 +1272,7 @@ def _compute_rm_bounds(
         Mbar: float,
         pre_periods: List,
         results: Any,
+        df: Optional[int] = None,
     ) -> Tuple[float, float, float, float]:
         """Compute bounds under relative magnitudes restriction."""
         # Estimate max pre-period violation from pre-trends
@@ -1209,12 +1283,18 @@ def _compute_rm_bounds(
             # No pre-period violations detected - use point estimate
             theta = np.dot(l_vec, beta_post)
             se = np.sqrt(l_vec @ sigma_post @ l_vec)
-            z = stats.norm.ppf(1 - self.alpha / 2)
+            z = _get_critical_value(self.alpha, df)
             return theta, theta, theta - z * se, theta + z * se
 
         # Compute bounds
         lb, ub, ci_lb, ci_ub = _compute_clf_ci(
-            beta_post, sigma_post, l_vec, Mbar, max_pre_violation, self.alpha
+            beta_post,
+            sigma_post,
+            l_vec,
+            Mbar,
+            max_pre_violation,
+            self.alpha,
+            df=df,
         )
 
         return lb, ub, ci_lb, ci_ub
@@ -1229,16 +1309,17 @@ def _compute_combined_bounds(
         M: float,
         pre_periods: List,
         results: Any,
+        df: Optional[int] = None,
     ) -> Tuple[float, float, float, float]:
         """Compute bounds under combined smoothness + RM restriction."""
         # Get smoothness bounds
         lb_sd, ub_sd, _, _ = self._compute_smoothness_bounds(
-            beta_post, sigma_post, l_vec, num_pre, num_post, M
+            beta_post, sigma_post, l_vec, num_pre, num_post, M, df=df
         )
 
         # Get RM bounds (use M as Mbar for combined)
         lb_rm, ub_rm, _, _ = self._compute_rm_bounds(
-            beta_post, sigma_post, l_vec, num_pre, num_post, M, pre_periods, results
+            beta_post, sigma_post, l_vec, num_pre, num_post, M, pre_periods, results, df=df
         )
 
         # Combined bounds are intersection
@@ -1252,7 +1333,7 @@ def _compute_combined_bounds(
 
         # Compute FLCI on combined bounds
         se = np.sqrt(l_vec @ sigma_post @ l_vec)
-        ci_lb, ci_ub = _compute_flci(lb, ub, se, self.alpha)
+        ci_lb, ci_ub = _compute_flci(lb, ub, se, self.alpha, df=df)
 
         return lb, ub, ci_lb, ci_ub