Skip to content

Latest commit

 

History

History
905 lines (700 loc) · 34.4 KB

File metadata and controls

905 lines (700 loc) · 34.4 KB

Choosing an Estimator

This guide helps you select the right estimator for your research design.

Decision Flowchart

Start here and follow the questions:

  1. Is this a triple-difference (DDD) design? (Two criteria for treatment: e.g., policy adoption AND group eligibility)
  2. Is treatment continuous? (Units receive different doses or intensities)
  3. Can treatment switch on AND off? (Reversible / non-absorbing treatment — e.g., marketing campaigns, seasonal promotions, on/off policy cycles)
  4. Is treatment staggered? (Different units treated at different times)
  5. Do you have panel data? (Multiple observations per unit over time)
  6. Do you need period-specific effects? (Event study design)
  7. Is your treated group small? (Few treated units, many controls)

Quick Reference

Estimator Best For Key Assumption Output
DifferenceInDifferences Simple 2x2 designs, cross-sectional comparisons Parallel trends (2 periods) Single ATT
TwoWayFixedEffects Panel data, simultaneous treatment Parallel trends (all periods) Single ATT with unit/time FE
MultiPeriodDiD Event studies, dynamic effects Parallel trends (pre-periods) Period-specific effects
CallawaySantAnna Staggered adoption, heterogeneous timing Conditional parallel trends Group-time ATT(g,t), aggregations
ChaisemartinDHaultfoeuille Reversible / non-absorbing treatments (only library option) Parallel trends + A5 (no crossing) + A11 (stable controls) DID_l event study (L_max), normalized DID^n_l, cost-benefit delta, placebos, sup-t bands, TWFE diagnostic
SyntheticDiD Few treated units, many controls Synthetic parallel trends ATT with unit/time weights
EfficientDiD Staggered adoption with optimal efficiency PT-All (overidentified) or PT-Post Group-time ATT(g,t), aggregations
ContinuousDiD Continuous dose / treatment intensity Strong Parallel Trends (SPT) for dose-response; PT for binarized ATT ATTloc (PT); ATT(d), ACRT(d) (SPT)
HeterogeneousAdoptionDiD Universal rollout, dose varies, no untreated unit dCDH 2026 Assumptions (Design 1' QUG case or Design 1 with A6/A5) WAS or WASd_lower per resolved estimand; event-study Appendix B.2
SunAbraham Staggered adoption, interaction-weighted Conditional parallel trends Cohort-specific ATTs, event study
ImputationDiD Staggered, homogeneous effects Unit + time FE structure Imputed treatment effects, event study
TwoStageDiD Staggered adoption, efficient Unit + time FE structure Single ATT or event study
StackedDiD Staggered, sub-experiment approach Parallel trends per cohort Trimmed aggregate ATT
TROP Factor confounding suspected Factor model + weights ATT with triple robustness
TripleDifference Two eligibility criteria (DDD) Parallel trends for both dimensions DDD ATT (regression, IPW, or DR)
StaggeredTripleDifference Staggered DDD with treatment timing Conditional parallel trends (DDD) Group-time ATT(g,t), aggregations
WooldridgeDiD Nonlinear outcomes or saturated OLS Conditional parallel trends OLS: direct coefficients; logit/Poisson: ASF-based ATT
BaconDecomposition TWFE diagnostic (diagnostic tool) 2x2 decomposition weights

Detailed Guidance

Basic 2x2 DiD

Use :class:`~diff_diff.DifferenceInDifferences` when:

  • You have a simple before/after, treatment/control design
  • Treatment occurs simultaneously for all treated units
  • You want a single average treatment effect
from diff_diff import DifferenceInDifferences

did = DifferenceInDifferences()
results = did.fit(data, outcome='y', treatment='treated', time='post')

Two-Way Fixed Effects

Use :class:`~diff_diff.TwoWayFixedEffects` when:

  • You have panel data with multiple time periods
  • Treatment timing is the same for all treated units
  • You want to control for unit and time fixed effects
  • You don't need to see period-by-period effects

Warning

TWFE can be biased with staggered treatment timing. Already-treated units act as controls for newly-treated units, which can cause negative weighting. Use :class:`~diff_diff.CallawaySantAnna` for staggered designs.

from diff_diff import TwoWayFixedEffects

twfe = TwoWayFixedEffects()
results = twfe.fit(data, outcome='y', treatment='treated',
                   unit='unit_id', time='period')

Multi-Period Event Study

Use :class:`~diff_diff.MultiPeriodDiD` when:

  • You want a full event-study with pre and post treatment effects
  • You need pre-period coefficients to assess parallel trends
  • You want to visualize treatment effect dynamics over time
  • All treated units receive treatment at the same time (simultaneous adoption)
from diff_diff import MultiPeriodDiD, plot_event_study

event = MultiPeriodDiD()
results = event.fit(data, outcome='y', treatment='treated',
                    time='period', unit='unit_id', reference_period=2)

# Visualize
plot_event_study(results)

Callaway-Sant'Anna

Use :class:`~diff_diff.CallawaySantAnna` when:

  • Treatment is adopted at different times (staggered rollout)
  • You want valid treatment effect estimates with heterogeneous timing
  • You need group-time specific effects ATT(g,t)

This is the recommended estimator for most applied work with staggered adoption.

from diff_diff import CallawaySantAnna

cs = CallawaySantAnna(
    control_group='never_treated',  # or 'not_yet_treated'
    estimation_method='dr'  # doubly robust (recommended)
)
results = cs.fit(data, outcome='y', unit='unit_id',
                 time='period', first_treat='first_treat',
                 covariates=['x1', 'x2'])

# Overall ATT
print(f"Overall ATT: {results.overall_att:.3f}")

# Event study aggregation
es = cs.fit(data, outcome='y', unit='unit_id',
            time='period', first_treat='first_treat',
            covariates=['x1', 'x2'], aggregate='event_study')
event_study_df = es.to_dataframe('event_study')

Reversible (Non-Absorbing) Treatment

Use :class:`~diff_diff.ChaisemartinDHaultfoeuille` (alias :class:`~diff_diff.DCDH`) when:

  • Treatment can switch on and off over time (e.g., marketing campaigns, seasonal promotions, on/off policy cycles)
  • You need separate joiners (DID_+) and leavers (DID_-) views, plus the aggregate DID_M
  • You want a built-in placebo and a TWFE decomposition diagnostic computed on the data you pass in (pre-filter) for direct comparison against DID_M
  • You want a multi-horizon event study (pass L_max to fit()) with normalized effects, cost-benefit aggregation, dynamic placebos, and sup-t simultaneous confidence bands

This is the only library estimator that handles non-absorbing treatments. All other staggered estimators (:class:`~diff_diff.CallawaySantAnna`, :class:`~diff_diff.SunAbraham`, :class:`~diff_diff.ImputationDiD`, :class:`~diff_diff.TwoStageDiD`, :class:`~diff_diff.EfficientDiD`, :class:`~diff_diff.WooldridgeDiD`) assume treatment is absorbing - once treated, stays treated.

Ships DID_M (= DID_1) from de Chaisemartin & D'Haultfœuille (2020), the full multi-horizon event study DID_l for l = 1..L_max from the dynamic companion paper (NBER WP 29873), residualization-style covariate adjustment (controls), group-specific linear trends (trends_linear), state-set-specific trends (trends_nonparam), heterogeneity testing, non-binary treatment, HonestDiD sensitivity integration on placebos, and survey support via Taylor-series linearization.

from diff_diff import ChaisemartinDHaultfoeuille
from diff_diff.prep import generate_reversible_did_data

data = generate_reversible_did_data(n_groups=80, n_periods=6, seed=42)

est = ChaisemartinDHaultfoeuille()
results = est.fit(
    data,
    outcome="outcome",
    group="group",
    time="period",
    treatment="treatment",
)
results.print_summary()

print(f"DID_M (overall): {results.overall_att:.3f}")
print(f"DID_+ (joiners): {results.joiners_att:.3f}")
print(f"DID_- (leavers): {results.leavers_att:.3f}")
print(f"Placebo:         {results.placebo_effect:.3f}")

Note

By default, the estimator drops groups whose treatment switches more than once before estimation (drop_larger_lower=True, matching the R DIDmultiplegtDYN reference). This is required for the analytical variance formula to be consistent with the point estimate. Each drop emits an explicit warning.

Note

Single-period placebo DID_M^pl (L_max=None) has NaN SE - the per-period aggregation path has no influence-function derivation, so inference fields stay NaN even when n_bootstrap > 0. The point estimate is meaningful for visual pre-trends inspection. Multi-horizon dynamic placebos DID^{pl}_l (L_max >= 1) have valid analytical SE and bootstrap SE via the placebo IF. See docs/methodology/REGISTRY.md for the full contract.

Note

ChaisemartinDHaultfoeuille supports survey_design with pweight and strata/PSU/FPC via Taylor Series Linearization. Replicate weights are not yet supported.

Synthetic DiD

Use :class:`~diff_diff.SyntheticDiD` when:

  • You have few treated units but many control units
  • Pre-treatment fit between treated and control is poor
  • You want to construct a weighted synthetic control
from diff_diff import SyntheticDiD, generate_did_data

# SyntheticDiD requires block treatment (constant within units)
block_data = generate_did_data(n_units=40, n_periods=10, treatment_effect=2.0)
sdid = SyntheticDiD()
results = sdid.fit(block_data, outcome='outcome', unit='unit',
                   time='period', treatment='treated')

# View the unit weights
print(results.unit_weights)

Continuous Treatment

Use :class:`~diff_diff.ContinuousDiD` when:

  • Treatment varies in intensity or dose (e.g., subsidy amount, hours of training)
  • You want to estimate how effects change with treatment dose
  • You need the full dose-response curve, not just a single average effect
  • Staggered adoption where units receive different treatment levels

Note

Dose-response curves ATT(d) and ACRT(d) require Strong Parallel Trends (SPT). Under standard PT only the binarized ATTloc is identified. Data must include an untreated group (D = 0), a balanced panel, and time-invariant dose (each unit's dose is fixed across periods).

from diff_diff import ContinuousDiD, generate_continuous_did_data

data = generate_continuous_did_data(n_units=200, seed=42)

est = ContinuousDiD(n_bootstrap=199, seed=42)
results = est.fit(data, outcome='outcome', unit='unit',
                  time='period', first_treat='first_treat',
                  dose='dose', aggregate='dose')

# Overall effect and dose-response curve
print(f"Overall ATT: {results.overall_att:.3f}")
att_curve = results.dose_response_att.to_dataframe()

Universal Rollout / No Untreated Control

Use :class:`~diff_diff.HeterogeneousAdoptionDiD` when:

  • Every unit is treated at the post period (universal-rollout policy, industry-wide tariff change, simultaneous launch into all markets)
  • Treatment intensity (dose) varies across units, but no genuinely untreated control group exists to anchor a standard DiD contrast
  • :class:`~diff_diff.ContinuousDiD` is unavailable because its untreated-group requirement (D = 0) is violated

The estimator implements de Chaisemartin, Ciccia, D'Haultfoeuille and Knau (2026, arXiv:2405.04465v6) and resolves to one of two estimands depending on the dose support:

  • Design 1' (QUG case, ``d_lower = 0``) identifies the Weighted Average Slope (WAS) under the Quasi-Untreated-Group assumption (units with the smallest dose serve as the comparison anchor). The shipped result class exposes target_parameter == "WAS".
  • Design 1 (no QUG, ``d_lower > 0``) identifies WAS_{d_lower} under Assumption 6, or sign identification only under Assumption 5; neither additional assumption is testable via pre-trends. Result class exposes target_parameter == "WAS_d_lower".

The dose-distribution path is auto-detected. Run :func:`~diff_diff.did_had_pretest_workflow` to vet the identifying assumptions before estimation; see :doc:`api/had` for the full API and SE-regime contract.

import numpy as np
import pandas as pd
from diff_diff import HeterogeneousAdoptionDiD, did_had_pretest_workflow

# Build a HAD-shape panel: D=0 in pre-periods (t < F), D > 0 only at F+.
rng = np.random.default_rng(42)
G, F, T = 200, 4, 5
doses = rng.beta(0.5, 1.0, size=G)
rows = []
for g in range(G):
    for t in range(1, T + 1):
        y = (rng.normal()
             + (doses[g] + doses[g] ** 2) * (t >= F)
             + rng.normal(0, 0.5))
        d = doses[g] if t >= F else 0.0
        rows.append({'unit': g, 'period': t, 'y': y, 'dose': d})
had_data = pd.DataFrame(rows)

pretests = did_had_pretest_workflow(had_data, outcome_col='y', unit_col='unit',
                                    time_col='period', dose_col='dose',
                                    aggregate='event_study')

est = HeterogeneousAdoptionDiD()
results = est.fit(had_data, outcome_col='y', unit_col='unit',
                  time_col='period', dose_col='dose',
                  aggregate='event_study')

# Event-study results: per-horizon WAS at each event time
for e, att in zip(results.event_times, results.att):
    print(f"  e={e}: {att:.3f}")

Efficient DiD

Use :class:`~diff_diff.EfficientDiD` when:

  • You have staggered adoption and want maximum statistical efficiency on the no-covariate path
  • You believe parallel trends holds across all pre-treatment periods (PT-All)
  • You want tighter confidence intervals than Callaway-Sant'Anna
  • You need a formal efficiency benchmark for comparing estimators

Note

EfficientDiD supports covariate adjustment via a doubly-robust path: sieve-based propensity score ratios combined with a linear OLS outcome regression. The DR property gives consistency if either the OR or the PS is correctly specified, but the linear OLS outcome regression does not generically attain the semiparametric efficiency bound unless the conditional mean is linear in the covariates. The unqualified efficiency claim applies to the no-covariate path only. Pass column names to the covariates parameter on fit(). See docs/methodology/REGISTRY.md for the full contract.

from diff_diff import EfficientDiD

edid = EfficientDiD(pt_assumption="all")  # or "post" for post-treatment CS match
results = edid.fit(data, outcome='y', unit='unit_id',
                   time='period', first_treat='first_treat',
                   aggregate='all')
results.print_summary()

Sun-Abraham

Use :class:`~diff_diff.SunAbraham` when:

  • You have staggered adoption and want an interaction-weighted event study
  • You want to decompose effects by cohort and relative time
  • You need a regression-based complement to Callaway-Sant'Anna

Sun & Abraham (2021) uses a saturated TWFE regression with cohort x relative-time interactions, then aggregates cohort-specific effects using interaction weights.

from diff_diff import SunAbraham

sa = SunAbraham(control_group='never_treated')
results = sa.fit(data, outcome='y', unit='unit_id',
                 time='period', first_treat='first_treat')
results.print_summary()

Note

Running both Sun-Abraham and Callaway-Sant'Anna provides a useful robustness check. Both are consistent under heterogeneous treatment effects.

Imputation DiD

Use :class:`~diff_diff.ImputationDiD` when:

  • You have staggered adoption with homogeneous treatment effects
  • You want shorter confidence intervals than Callaway-Sant'Anna (~50% shorter)
  • You need imputed counterfactual outcomes for treated observations

Borusyak, Jaravel & Spiess (2024) estimate unit + time FE on untreated observations, impute counterfactual Y(0) for treated observations, then aggregate.

from diff_diff import ImputationDiD

imp = ImputationDiD()
results = imp.fit(data, outcome='y', unit='unit_id',
                  time='period', first_treat='first_treat',
                  aggregate='event_study')
results.print_summary()

Note

Under homogeneous effects, ImputationDiD is semiparametrically efficient. If you suspect heterogeneous effects across cohorts, prefer Callaway-Sant'Anna.

Two-Stage DiD

Use :class:`~diff_diff.TwoStageDiD` when:

  • You want the same point estimates as ImputationDiD with a different variance estimator
  • You prefer the GMM sandwich variance that accounts for first-stage uncertainty
  • You want a single ATT or an event study from a two-stage procedure

Gardner (2022) estimates FE on untreated obs (stage 1), residualizes all outcomes, then regresses residuals on treatment indicators (stage 2).

from diff_diff import TwoStageDiD

ts = TwoStageDiD()
results = ts.fit(data, outcome='y', unit='unit_id',
                 time='period', first_treat='first_treat',
                 aggregate='event_study')
results.print_summary()

Note

Point estimates are identical to ImputationDiD; the key difference is the variance estimator (GMM sandwich vs. conservative clustered).

Stacked DiD

Use :class:`~diff_diff.StackedDiD` when:

  • You have staggered adoption and want a sub-experiment approach
  • You want to avoid forbidden comparisons in TWFE by construction
  • You need corrective Q-weights for unbiased stacked estimation

Wing, Freedman & Hollingsworth (2024) create one sub-experiment per adoption cohort with clean controls and apply Q-weights to reweight the stacked regression.

from diff_diff import StackedDiD

stk = StackedDiD(kappa_pre=2, kappa_post=3)
results = stk.fit(data, outcome='y', unit='unit_id',
                  time='period', first_treat='first_treat',
                  aggregate='event_study')
results.print_summary()

Note

The trimmed aggregate ATT may exclude early or late cohorts whose event windows do not fit in the data. Check results.trimmed_groups.

TROP

Use :class:`~diff_diff.TROP` when:

  • You suspect interactive fixed effects (factor confounding)
  • Standard parallel trends may not hold due to unobserved factors
  • You want triple robustness: factor model + unit weights + time weights

Athey, Imbens, Qu & Viviano (2025) combine nuclear norm regularization, exponential unit distance weights, and time decay weights with LOOCV tuning.

from diff_diff import TROP

trop = TROP(n_bootstrap=200)
results = trop.fit(data, outcome='y', treatment='treated',
                   unit='unit_id', time='period')
results.print_summary()

Note

TROP is computationally intensive. Use method='global' for faster estimation at the cost of some flexibility vs. method='local'.

Bacon Decomposition

Use :class:`~diff_diff.BaconDecomposition` when:

  • You want to diagnose whether TWFE is biased in your staggered setting
  • You need to see which 2x2 comparisons drive the TWFE estimate
  • You want to check whether later-vs-earlier or already-treated-as-control comparisons carry substantial weight

Goodman-Bacon (2021) decomposes the TWFE estimate into a weighted average of all 2x2 DiD comparisons and their weights.

from diff_diff import BaconDecomposition, plot_bacon

bacon = BaconDecomposition()
results = bacon.fit(data, outcome='y', unit='unit_id',
                    time='period', first_treat='first_treat')
results.print_summary()

# Visualize the decomposition
plot_bacon(results)

Note

This is a diagnostic tool, not an estimator. If the decomposition reveals problematic weights, switch to Callaway-Sant'Anna or another robust estimator.

Common Pitfalls

  1. Using TWFE with staggered adoption

    TWFE estimates a weighted average of all 2x2 comparisons, including "forbidden" comparisons where already-treated units serve as controls. This can lead to severe bias, even negative weights on treatment effects.

    Solution: Use CallawaySantAnna for staggered designs.

  2. Ignoring treatment effect heterogeneity

    If treatment effects vary by cohort (when units are treated) or over time (dynamic effects), aggregated estimators may be misleading.

    Solution: Use CallawaySantAnna and examine ATT(g,t) and event study plots.

  3. Failing to test parallel trends

    The parallel trends assumption is untestable in the post-period but can be assessed using pre-treatment data.

    Solution: Use :func:`~diff_diff.check_parallel_trends` and :class:`~diff_diff.HonestDiD` for sensitivity analysis.

  4. Inappropriate clustering

    Standard errors should typically be clustered at the level of treatment assignment (often the unit level).

    Solution: Always specify cluster for panel data.

Standard Error Methods

Different estimators compute standard errors differently. Understanding these differences helps interpret results and choose appropriate inference.

Estimator Default SE Method Details
DifferenceInDifferences HC1 (heteroskedasticity-robust) Uses White's robust SEs by default. Specify cluster for cluster-robust SEs. Use inference='wild_bootstrap' for few clusters (<30).
TwoWayFixedEffects Cluster-robust (unit level) Always clusters at unit level after within-transformation. Specify cluster to override. Use inference='wild_bootstrap' for few clusters.
MultiPeriodDiD HC1 (heteroskedasticity-robust) Same as basic DiD. Cluster-robust available via cluster. Wild bootstrap not yet supported for multi-coefficient inference.
CallawaySantAnna Analytical (influence function) Uses influence-function SEs with WIF adjustment by default. Set n_bootstrap=999 for multiplier bootstrap inference (weight types: rademacher, mammen, webb).
SyntheticDiD Placebo, paper-faithful refit bootstrap, or jackknife Default uses placebo-based variance (variance_method="placebo"). Set variance_method="bootstrap" for paper-faithful Algorithm 2 bootstrap (re-estimates ω and λ via Frank-Wolfe per draw; ~5–30× slower than placebo, panel-size dependent). Both methods use n_bootstrap replications (default 200). variance_method="jackknife" is also available.
ContinuousDiD Analytical (influence function) Uses influence-function-based SEs by default. Use n_bootstrap=199 (or higher) for multiplier bootstrap inference with proper CIs.
HeterogeneousAdoptionDiD Path-dependent (CCT-2014 / 2SLS / Binder TSL) Three SE regimes per :doc:`api/had`. Unweighted: continuous-dose paths use the CCT-2014 weighted-robust SE from the in-house lprobust port; mass-point uses a 2SLS sandwich. Deprecated ``weights=`` shortcut: continuous reuses CCT-2014; mass-point uses analytical weighted 2SLS (classical / hc1; CR1 when cluster= is supplied, except mass-point + cluster= + aggregate="event_study" + cband=True is rejected outright - see :doc:`api/had` for the cluster-combination deviation note); yields variance_formula="pweight" / "pweight_2sls". ``survey_design=SurveyDesign(weights="col", ...)``: both paths compose Binder (1983) Taylor-series linearization ("survey_binder_tsl" / "survey_binder_tsl_2sls"); mass-point + survey_design= + cluster= is also rejected outright (combined survey + cluster inference is deferred). The two weighted families differ on this estimator until the next-minor unification lands. Per-horizon CIs are pointwise; sup-t bands available only on the weighted event-study path via cband=True.
SunAbraham Cluster-robust (unit level) Clusters at unit level by default. Specify cluster to override. Use n_bootstrap for pairs bootstrap inference.
ImputationDiD Conservative clustered (Theorem 3) Uses conservative clustered variance from Borusyak et al. Theorem 3, clustered at unit level. Use n_bootstrap for multiplier bootstrap.
TwoStageDiD GMM sandwich (clustered) Uses GMM sandwich variance accounting for first-stage estimation uncertainty, clustered at unit level. Use n_bootstrap for multiplier bootstrap.
StackedDiD Cluster-robust (unit level) Clusters at unit level by default. Set cluster='unit_subexp' for (unit, sub-experiment) clustering.
TripleDifference Influence function (robust) Uses influence-function-based SEs (inherently heteroskedasticity-robust). Specify cluster for cluster-robust SEs.
TROP Bootstrap (n_bootstrap=200) Uses unit-level block bootstrap for variance estimation. Bootstrap is always required (minimum n_bootstrap=2).
EfficientDiD Analytical (EIF-based) Uses efficient influence function SE = sqrt(mean(EIF^2) / n). Use n_bootstrap for multiplier bootstrap.
BaconDecomposition N/A (diagnostic) Diagnostic tool only; does not produce standard errors.

Recommendations by sample size:

  • Large samples (N > 1000, clusters > 50): Default analytical SEs are reliable
  • Medium samples (clusters 30-50): Cluster-robust SEs recommended
  • Small samples (clusters < 30): Use wild cluster bootstrap (inference='wild_bootstrap')
  • Very few clusters (< 10): Use Webb 6-point distribution (weight_type='webb')

Common pitfall: Forgetting to cluster when units are observed multiple times. For panel data, always cluster at the unit level unless you have a strong reason not to.

from diff_diff import DifferenceInDifferences, generate_did_data

panel = generate_did_data(n_units=200, n_periods=10, treatment_effect=2.0)

# Good: Cluster at unit level for panel data
did = DifferenceInDifferences(cluster='unit')
results = did.fit(panel, outcome='outcome', treatment='treated',
                  time='post')

# Better for few clusters: Wild bootstrap
did = DifferenceInDifferences(inference='wild_bootstrap', cluster='unit')
results = did.fit(panel, outcome='outcome', treatment='treated',
                  time='post')

When in Doubt

If you're unsure which estimator to use:

  1. Start with CallawaySantAnna - It's valid even for non-staggered designs and provides the most flexible output (group-time effects, aggregations)
  2. Check for heterogeneity - Plot event studies to see if effects vary
  3. Run sensitivity analysis - Use HonestDiD to assess robustness
  4. Compare estimators - If results differ substantially across estimators, investigate why (often reveals violations of assumptions)
  5. Using survey data? - Pass a SurveyDesign to fit() for design-based variance estimation. See the :ref:`survey-design-support` section below for the compatibility matrix, and the survey tutorial for a full walkthrough.

Survey Design Support

All estimators accept an optional survey_design parameter in fit(). Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance estimation. The depth of support varies by estimator:

Note

If your data starts as individual-level survey microdata (e.g., BRFSS, ACS, CPS, NHANES respondent records), use :func:`~diff_diff.aggregate_survey` as a preprocessing step. It pools microdata into geographic-period cells and returns a pre-configured :class:`~diff_diff.SurveyDesign`. By default, the returned design uses weight_type="pweight" (unit-constant population weights), which is compatible with all survey-capable estimators in the matrix below. Pass second_stage_weights="aweight" for precision weights (inverse variance) if you prefer efficiency-weighted estimates - this mode is limited to estimators marked Full. See :doc:`api/prep` for the API reference.

Estimator Weights Strata/PSU/FPC Replicate Weights Survey Bootstrap
DifferenceInDifferences Full Full Full --
TwoWayFixedEffects Full Full Full --
MultiPeriodDiD Full Full Full --
CallawaySantAnna pweight only Full Full Multiplier at PSU
ChaisemartinDHaultfoeuille pweight only Full (TSL) -- Group-level (warning)
TripleDifference pweight only Full Full (analytical) --
StaggeredTripleDifference pweight only Full Full Multiplier at PSU
SunAbraham Full Full Full Rao-Wu rescaled
StackedDiD pweight only Full (pweight only) Full --
ImputationDiD pweight only Full Full (analytical) Multiplier at PSU
TwoStageDiD pweight only Full Full (analytical) Multiplier at PSU
ContinuousDiD Full Full Full (analytical) Multiplier at PSU
HeterogeneousAdoptionDiD pweight only Full (Binder TSL) -- Multiplier (event-study, cband=True only)
EfficientDiD Full Full Full (analytical) Multiplier at PSU
SyntheticDiD pweight only Via bootstrap -- Hybrid pairs-bootstrap + Rao-Wu rescaled (bootstrap only)
TROP pweight only Via bootstrap -- Rao-Wu rescaled
WooldridgeDiD Full (pweight only) Full (analytical) -- --
BaconDecomposition Diagnostic Diagnostic -- --

Legend:

  • Full: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance
  • Full (pweight only): Full TSL with strata/PSU/FPC, but only pweight accepted (fweight/aweight rejected because composition changes weight semantics)
  • Via bootstrap: Strata/PSU/FPC supported only with bootstrap variance. TROP uses bootstrap by default. SyntheticDiD supports strata/PSU/FPC on variance_method='bootstrap' via a hybrid pairs-bootstrap + Rao-Wu rescaling composition (see the Note (survey + bootstrap composition) in REGISTRY.md §SyntheticDiD); placebo and jackknife remain pweight-only.
  • pweight only (Weights column): Only pweight accepted; fweight/aweight raise an error
  • Diagnostic: Weighted descriptive statistics only (no inference)
  • --: Not supported

Note

SyntheticDiD supports survey designs on variance_method='bootstrap' — both pweight-only and full strata/PSU/FPC — via a hybrid pairs-bootstrap composed with per-draw Rao-Wu rescaled weights fed into a weighted Frank-Wolfe re-estimation of ω and λ. See the Note (survey + bootstrap composition) in REGISTRY.md §SyntheticDiD for the objective form and argmin-set caveat.

variance_method='placebo' and variance_method='jackknife' remain pweight-only — composing placebo permutations / leave-one-out with Rao-Wu rescaling under the weighted objective is a separate derivation (tracked in TODO.md).

For the full walkthrough with code examples, see the survey tutorial. For deferred work and remaining limitations, see docs/survey-roadmap.md.