Paper: Callaway, Goodman-Bacon & Sant'Anna (2024/2025), "Difference-in-Differences with a Continuous Treatment," NBER Working Paper 32117 (arXiv: 2107.02637v8).
R reference implementation: contdid v0.1.0 (CRAN)
In many DiD applications, treatment has a dose or intensity rather than binary on/off. Examples: pollution exposure varying by distance, different minimum wage levels, varying tax rates, different subsidy shares.
Binary DiD cannot:
- Handle settings where all units receive some treatment (no clean untreated group)
- Estimate dose-response relationships
- Identify effects of marginal changes in treatment intensity
The standard TWFE regression for continuous DiD:
Y_{i,t} = theta_t + eta_i + beta^{twfe} * D_i * Post_t + v_{i,t} (Eq. 1.1)
beta^{twfe} suffers from multiple simultaneous problems:
- Negative weights on levels: Weights on ATT(l|l) integrate to zero, not one. Below-average dose units get negative weight.
- Selection bias under standard PT: A contamination term persists even under standard parallel trends.
- Non-representative weighting under strong PT: Even without selection bias, weights don't match the dose density, making beta^{twfe} sensitive to the untreated group share.
- Scale dependence: Rescaling dose (0-1 to 0-100) changes beta^{twfe} proportionally, while natural target parameters are invariant.
- Two periods: t=1 (pre), t=2 (post)
- In t=1, no unit is treated (all have dose 0)
- In t=2, some units receive dose D_i in D_+ subset (0, inf), rest remain at D_i = 0
| Symbol | Definition |
|---|---|
| Y_{i,t} | Outcome for unit i at time t |
| D_i | Dose (treatment intensity). D_i in D = {0} union D_+ |
| D_+ = (d_L, d_U) | Support of continuous dose among treated |
| Y_{i,t}(d) | Potential outcome under dose d |
| Delta Y | Y_{t=2} - Y_{t=1} |
| f_{D|D>0}(d) | Dose density conditional on being treated |
| Symbol | Definition |
|---|---|
| G_i | Timing group: period when unit i first treated |
| G = inf | Never-treated units |
| W_{i,t} = D_i * 1{t >= G_i} | Treatment exposure at time t |
| Y_{i,t}(g,d) | Potential outcome if first treated in period g with dose d |
- Dose: The amount/intensity of treatment. Can be continuous or multi-valued discrete. Crucially, dose is time-invariant for each unit (the amount doesn't change once assigned).
- Group: In two-period case, all units with the same dose d. In multi-period case, characterized by both timing (G_i) and dose (D_i).
Local ATT (effect of dose d on units who received dose d'):
ATT(d|d') = E[Y_{t=2}(d) - Y_{t=2}(0) | D = d']
When d' = d (the "own-dose" case):
ATT(d|d) = E[Y_{t=2}(d) - Y_{t=2}(0) | D = d]
Global ATT (effect of dose d across all treated):
ATT(d) = E[Y_{t=2}(d) - Y_{t=2}(0) | D > 0]
Key distinction: ATT(d|d) != ATT(d) when there is selection into dose groups on treatment effects. Under standard PT, only ATT(d|d) is identified. Under strong PT, ATT(d|d) = ATT(d).
For continuous treatment:
ACRT(d|d') = d/dl ATT(l|d') |_{l=d}
ACRT(d) = d/dd ATT(d)
For discrete (multi-valued) treatment:
ACRT(d_j|d_k) = [ATT(d_j|d_k) - ATT(d_{j-1}|d_k)] / (d_j - d_{j-1})
Level effects = height of dose-response curve (switching from 0 to d). Causal responses = slope of dose-response curve (marginal increase in d). These coincide for binary treatment; they diverge for continuous treatment.
Four natural scalar summaries:
ATT^{loc} = E[ATT(D|D) | D > 0] -- local levels, weighted by dose density
ATT^{glob} = E[ATT(D) | D > 0] -- global levels, weighted by dose density
ACRT^{loc} = E[ACRT(D|D) | D > 0] -- local slopes, weighted by dose density
ACRT^{glob} = E[ACRT(D) | D > 0] -- global slopes, weighted by dose density
loc = each dose group's own curve. glob = single curve for all treated.
Group-time-dose specific:
ATT(g,t,d|g,d) = E[Y_t(g,d) - Y_t(0) | G=g, D=d]
Aggregated across groups/periods by dose:
ATT^{dose}(d|d) = weighted average of ATT(g,t,d|g,d) across (g,t) cells
Event-study versions:
ATT^{es}_{loc}(e) = E[ATT^{dose,es}(D|D,e) | G+e in [2,T], G <= T]
where e = event time (periods since treatment).
For all d in D_+:
E[Y_{t=2}(0) - Y_{t=1}(0) | D = d] = E[Y_{t=2}(0) - Y_{t=1}(0) | D = 0]
Untreated potential outcome paths are the same across all dose groups and the untreated group. Direct analog of binary DiD parallel trends.
Identifies: ATT(d|d), ATT^{loc}. Does NOT identify ATT(d), ACRT, or any cross-dose comparison.
For all d in D:
E[Y_{t=2}(d) - Y_{t=1}(0) | D > 0] = E[Y_{t=2}(d) - Y_{t=1}(0) | D = d]
No selection into dose groups on the basis of treatment effects. Implies ATT(d|d) = ATT(d) for all d.
Additionally identifies: ATT(d), ACRT(d), ACRT^{glob}, and cross-dose comparisons have causal interpretation.
- No anticipation: Y_{i,t=1} = Y_{i,t=1}(0) for all units
- Overlap: P(D=0) > 0 (untreated units exist); f_{D|D>0}(d) > 0 on D_+
- Multi-period PT-MP: E[Delta Y_t(0) | G=g, D=d] = E[Delta Y_t(0) | G=inf, D=0]
When D in {0,1}: PT = SPT, ATT(1|1) = ATT(1) = ATT^{loc} = ATT^{glob}, ACRT = ATT. Everything collapses to standard Callaway & Sant'Anna (2021).
When dose takes values d_1, ..., d_J (Eq. 4.1):
Delta Y_i = beta_0 + sum_{j=1}^{J} 1{D_i = d_j} * beta_j + epsilon_i
- beta_j estimates ATT(d_j)
- (beta_j - beta_{j-1}) / (d_j - d_{j-1}) estimates ACRT(d_j)
- Standard OLS inference applies
This is the default in the R package and the recommended starting point.
For each (g,t) 2x2 cell:
- Compute untreated counterfactual: E_n[Delta Y | D=0]
- Demean treated outcomes: Delta tilde{Y}_i = Delta Y_i - E_n[Delta Y | D=0] for D_i > 0
- Construct B-spline basis psi^K(D) of degree
degreewithnum_knotsinterior knots - OLS regression among treated (Eq. 4.4):
Delta tilde{Y}_i = psi^K(D_i)' beta_K + epsilon_i - Evaluate at dose grid (Eq. 4.5):
ATT(d) = psi^K(d)' * hat{beta}_K ACRT(d) = (d/dd psi^K(d))' * hat{beta}_K
R package defaults: degree=3 (cubic), num_knots=0 (global polynomial), dose grid = quantiles P10 to P99 in 1% steps (90 points).
Adapts Chen, Christensen & Kankanala (2025) sieve estimator with data-driven dimension selection. Same regression framework as 5.2 but:
- Sieve dimension K is selected automatically via Lepski-type method (Algorithm 1)
- Provides honest, sup-norm rate-adaptive uniform confidence bands
- Restricted to two-period settings (no staggered adoption)
Algorithm for sieve dimension selection:
- Define candidate set K = {2^k + 3 : k in N_+}
- Compute K_max based on sample size
- For each K in candidate set, test stability of estimates across K values
- Select smallest K where estimates are stable (within bootstrap critical value)
ATT^{glob} (binarized DiD): Under SPT (Eq. 4.6):
ATT^{glob} = E[Delta Y | D > 0] - E[Delta Y | D = 0]
Simple difference in means between any-treated and untreated.
ACRT^{glob} (plug-in):
ACRT^{glob} = (1/n_{D>0}) * sum_{i: D_i > 0} ACRT(D_i)
Average the estimated ACRT curve over treated units' doses.
For staggered adoption: apply two-period estimation to each (g,t) cell separately,
then aggregate. The R package handles this via the ptetools framework.
For event-study ATT^{es}_{loc}(e): can binarize treatment and use standard Callaway & Sant'Anna (2021) machinery.
When P(D=0) = 0 (all units receive some treatment), use the lowest dose group d_L as comparison. Under PT, this recovers ATT(d|d) - ATT(d_L|d_L). Under SPT, recovers ATT(d) - ATT(d_L).
Standard OLS inference applies. Can cluster as needed.
Multiplier (Gaussian) bootstrap — NOT standard nonparametric bootstrap:
- Draw omega_i iid N(0,1) for i=1,...,n
- Compute weighted sums of influence functions using these weights
- Repeat B=1000 times
- No re-estimation needed per bootstrap draw (computationally efficient)
Pointwise confidence intervals:
ATT(d) +/- z_{0.975} * sigma_K(d) / sqrt(n)
Uniform confidence bands (UCBs):
- Compute bootstrap distribution of sup-t statistic across all dose values
- Critical value c_alpha from (1-alpha) quantile
- Band: ATT(d) +/- (c_alpha + A * gamma) * sigma(d) / sqrt(n)
- These are honest and rate-adaptive
ACRT^{glob} plug-in estimator is sqrt(n)-consistent and asymptotically normal. Standard errors via influence functions or multiplier bootstrap.
Four decompositions of beta^{twfe}, each revealing a different pathology:
| Decomposition | Weights positive? | Sum to 1? | Selection bias (under PT)? |
|---|---|---|---|
| (a) Causal response | Yes | Yes | Yes |
| (b) Levels | No | No (sum to 0) | N/A |
| (c) Scaled levels | No | Yes | N/A |
| (d) Scaled 2x2 | Yes | Yes | Yes |
Even under SPT (best case), decomposition (a) uses TWFE-specific weights that don't match the dose density, making beta^{twfe} an unappealing summary.
| Theorem | Statement (plain English) |
|---|---|
| 3.1 | Under PT: ATT(d|d) = E[Delta Y | D=d] - E[Delta Y | D=0]. The local ATT for each dose group is identified by the standard DiD comparison. |
| 3.2 | Under PT: cross-dose comparisons mix causal effects with selection bias. The derivative of E[Delta Y|D=d] does NOT identify ACRT without stronger assumptions. |
| 3.3 | Under SPT: ATT(d) = E[Delta Y|D=d] - E[Delta Y|D=0], and ACRT(d) = d/dd E[Delta Y|D=d]. Cross-dose comparisons are causal. |
| 3.4 | TWFE decomposition: beta^{twfe} admits four representations, all problematic. |
| Cor 3.1 | ATT^{glob} = binarized DiD. ACRT^{glob} = weighted average of dose-specific slopes. |
| C.1 | Multi-period: ATT(g,t,d|g,d) = E[Y_t - Y_{g-1}|G=g,D=d] - E[Y_t - Y_{g-1}|W_t=0]. |
Main function: cont_did() returns pte_dose_results or dose_obj.
Key parameters:
cont_did(yname, dname, gname, tname, idname, data,
target_parameter = "level"|"slope",
aggregation = "dose"|"eventstudy"|"none",
treatment_type = "continuous"|"discrete",
dose_est_method = "parametric"|"cck",
dvals = NULL,
degree = 3, num_knots = 0,
control_group = "notyettreated"|"nevertreated"|"eventuallytreated",
anticipation = 0,
bstrap = TRUE, boot_type = "multiplier", biters = 1000,
cband = FALSE, alp = 0.05,
base_period = "varying",
...)
-
Extract 2x2 subset: target group (g) + control group, pre-period + post-period
-
Construct B-spline basis from treated units' doses using
splines2::bSpline()Boundary knot note: The B-spline boundary knots are set from the training doses (
range(dose_treated)). Evaluation atdvalsis clamped to these boundaries. R'scontdidv0.1.0 usesrange(dvals)as boundary knots when evaluating, which can cause extrapolation artifacts. This is an intentional deviation. -
OLS: regress Delta Y on B-spline basis
-
Evaluate fitted spline at
dvals-> ATT(d) vector -
Evaluate derivative of spline at
dvals-> ACRT(d) vector -
Return estimates + influence functions for bootstrap
"dose": Average across (g,t) cells at each dose point -> dose-response curve"eventstudy": Average across dose at each event time e -> dynamic effects"none": Return disaggregated (g,t,d) results
- Dose is time-invariant: Set to actual value in ALL periods (pre and post)
- Never-treated: G=0, dose forced to 0 internally
- Balanced panel required in v0.1.0
- Units with treatment timing but zero dose are dropped
dvals = quantile(dose[dose > 0], probs = seq(0.10, 0.99, 0.01))
90 evaluation points, P10 to P99. Provides implicit tail trimming.
Quantile-based by default: choose_knots_quantile(dose[dose > 0], num_knots).
With num_knots=0, no interior knots (global polynomial of given degree).
Knots are built once globally from all positive doses, not per (g,t) cell.
This ensures a common basis space across cells so that dose-response vectors
can be meaningfully aggregated.
| R Package | Purpose | Python Equivalent |
|---|---|---|
splines2 |
B-spline basis + derivatives | scipy.interpolate.BSpline + custom derivative |
sandwich |
Robust variance | Already in diff-diff linalg.py |
ptetools |
Group-time iteration, aggregation, bootstrap | Reimplement (mirrors existing CS framework) |
MASS::ginv |
Pseudo-inverse | numpy.linalg.pinv |
npiv |
CCK nonparametric | Reimplement for CCK method |
- Covariates not supported (xformula = ~1 only)
- Discrete treatment not yet implemented
- Unbalanced panels not supported
- CCK restricted to 2-period case
- Repeated cross-sections not supported
- Parametric B-spline estimation for two-period case
- ATT(d) and ACRT(d) dose-response curves
- Summary parameters: ATT^{glob}, ACRT^{glob}
- Bootstrap inference (multiplier)
- Multi-period extension via (g,t) cell iteration
- Dose aggregation and event-study aggregation
- Control group options (not-yet-treated, never-treated)
- CCK nonparametric estimation
- Uniform confidence bands
- Covariates support (DR/IPW/OR)
- Discrete treatment (saturated regression — simpler, add later)
- TWFE decomposition diagnostics