You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before asking a new methodological question, check if it's already answered here. If you don't find your answer, absolutely ask—just reference this thread so we know you've done your homework.
Method Selection
Q: When should I use difference-in-differences vs. synthetic control?
A:
DID when you have multiple treated units, clear treatment timing, and can plausibly argue parallel trends
Synthetic Control when you have one (or few) treated units, comparison units with different pre-treatment trends, and need to construct an explicit counterfactual
Key trade-off: DID gives you more statistical power but requires stronger assumptions. Synthetic control is more flexible but has wider confidence intervals.
Q: What's the difference between heterogeneous treatment effects and subgroup analysis?
A: Subgroup analysis splits your sample and estimates separate treatment effects—but this can mislead if groups differ on unobservables. Heterogeneous treatment effect methods (like causal forests) use machine learning to discover effect variation while properly controlling for confounding.
Critical distinction: Traditional subgroups require you to choose groupings ex-ante. HTE methods discover them from the data.
Q: Do I need administrative data for these methods?
A: No, but it helps. Administrative data gives you:
Larger sample sizes → more power
Longer time series → better pre-treatment validation
Richer covariates → stronger controls
Survey data can work if your sample size and time coverage are adequate. The key is having good counterfactual observations, not necessarily administrative records.
Q: How do I handle missing data in causal inference?
A: Carefully. Missing data mechanisms interact with causal identification:
Missing Completely at Random (MCAR): Reduces power but doesn't bias estimates
Missing at Random (MAR): Can bias if missingness correlates with treatment
Missing Not at Random (MNAR): Serious threat to validity
Solutions depend on the mechanism. Multiple imputation helps with MAR. MNAR often requires sensitivity analyses or additional assumptions.
Q: What sample size do I need?
A: It depends on:
Expected effect size (smaller effects need more observations)
Outcome variance (noisier outcomes need more observations)
Number of time periods (panel data gives you more effective observations)
Clustering structure (if applicable)
Rule of thumb: You need enough power to detect the minimum policy-relevant effect size. If you can't detect an effect that would justify the intervention, your study isn't useful even if it's statistically significant.
Interpretation & Reporting
Q: How do I interpret a treatment effect estimate?
A: Always report:
Magnitude: Is the effect size policy-relevant?
Precision: How wide are the confidence intervals?
Identification: What variation identifies this effect?
Limitations: What would invalidate this estimate?
Never report just the coefficient and p-value.
Q: What if I don't find a significant effect?
A: Null results are informative if:
You had adequate power to detect policy-relevant effects
Your identification strategy was credible
You can rule out effects above a certain magnitude
Don't torture your data until it confesses. Sometimes policies don't work.
Q: How do I explain causal inference to non-technical audiences?
A: Focus on the comparison, not the math:
"We compared similar [units] that did and didn't get the policy"
"We checked whether trends were similar before the policy"
"We ruled out alternative explanations like [X, Y, Z]"
Use visualizations. Show the counterfactual explicitly. Avoid jargon.
Contributing to This FAQ
Found a question you keep seeing? Reply to this thread with:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Before asking a new methodological question, check if it's already answered here. If you don't find your answer, absolutely ask—just reference this thread so we know you've done your homework.
Method Selection
Q: When should I use difference-in-differences vs. synthetic control?
A:
Key trade-off: DID gives you more statistical power but requires stronger assumptions. Synthetic control is more flexible but has wider confidence intervals.
See: 14-synthetic-control-policy-lab.ipynb
Q: What's the difference between heterogeneous treatment effects and subgroup analysis?
A: Subgroup analysis splits your sample and estimates separate treatment effects—but this can mislead if groups differ on unobservables. Heterogeneous treatment effect methods (like causal forests) use machine learning to discover effect variation while properly controlling for confounding.
Critical distinction: Traditional subgroups require you to choose groupings ex-ante. HTE methods discover them from the data.
See: 11-heterogeneous-treatment-effects.ipynb
Q: How do I know if my regression discontinuity design is valid?
A: Run these diagnostic tests:
If any of these fail, your identification is suspect.
See: 15-regression-discontinuity-toolkit.ipynb
Data & Implementation
Q: Do I need administrative data for these methods?
A: No, but it helps. Administrative data gives you:
Survey data can work if your sample size and time coverage are adequate. The key is having good counterfactual observations, not necessarily administrative records.
Q: How do I handle missing data in causal inference?
A: Carefully. Missing data mechanisms interact with causal identification:
Solutions depend on the mechanism. Multiple imputation helps with MAR. MNAR often requires sensitivity analyses or additional assumptions.
Q: What sample size do I need?
A: It depends on:
Rule of thumb: You need enough power to detect the minimum policy-relevant effect size. If you can't detect an effect that would justify the intervention, your study isn't useful even if it's statistically significant.
Interpretation & Reporting
Q: How do I interpret a treatment effect estimate?
A: Always report:
Never report just the coefficient and p-value.
Q: What if I don't find a significant effect?
A: Null results are informative if:
Don't torture your data until it confesses. Sometimes policies don't work.
Q: How do I explain causal inference to non-technical audiences?
A: Focus on the comparison, not the math:
Use visualizations. Show the counterfactual explicitly. Avoid jargon.
Contributing to This FAQ
Found a question you keep seeing? Reply to this thread with:
We'll update the main post regularly.
Beta Was this translation helpful? Give feedback.
All reactions