Skip to content

Commit efdb8a6

Browse files
authored
Merge pull request #345 from DoubleML/p-edits-cs-did
Edits on CS DID PR
2 parents bbda382 + d3cc9b8 commit efdb8a6

File tree

16 files changed

+93
-20
lines changed

16 files changed

+93
-20
lines changed

doubleml/did/datasets/dgp_did_cs_CS2021.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -97,8 +97,8 @@ def make_did_cs_CS2021(n_obs=1000, dgp_type=1, include_never_treated=True, lambd
9797
9898
P(G_i = g) = \\frac{1}{G} \\text{ for all } g
9999
100-
7. Steps 1-6 generate panel data. To obtain repeated cross-sectional data, the number of generated indivials is increased
101-
to `n_obs/lambda_t`, where `lambda_t` denotes the pobability to observe a unit at each time period (time constant).
100+
7. Steps 1-6 generate panel data. To obtain repeated cross-sectional data, the number of generated individuals is increased
101+
to `n_obs/lambda_t`, where `lambda_t` denotes the probability to observe a unit at each time period (time constant).
102102
for each
103103
104104
@@ -133,7 +133,8 @@ def make_did_cs_CS2021(n_obs=1000, dgp_type=1, include_never_treated=True, lambd
133133
Whether to include units that are never treated.
134134
135135
lambda_t : float, default=0.5
136-
Probability of observing a unit at each time period.
136+
Probability of observing a unit at each time period. Note that internally `n_obs/lambda_t` individuals are
137+
generated of which only a fraction `lambda_t` is observed at each time period (see Step 7 in the DGP description).
137138
138139
time_type : str, default="datetime"
139140
Type of time variable. Either "datetime" or "float".

doubleml/did/did.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ class DoubleMLDID(LinearScoreMixin, DoubleML):
3737
Default is ``5``.
3838
3939
n_rep : int
40-
Number of repetitons for the sample splitting.
40+
Number of repetitions for the sample splitting.
4141
Default is ``1``.
4242
4343
score : str
@@ -47,7 +47,7 @@ class DoubleMLDID(LinearScoreMixin, DoubleML):
4747
Default is ``'observational'``.
4848
4949
in_sample_normalization : bool
50-
Indicates whether to use a sligthly different normalization from Sant'Anna and Zhao (2020).
50+
Indicates whether to use a slightly different normalization from Sant'Anna and Zhao (2020).
5151
Default is ``True``.
5252
5353
trimming_rule : str

doubleml/did/did_binary.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ class DoubleMLDIDBinary(LinearScoreMixin, DoubleML):
7070
Default is ``5``.
7171
7272
n_rep : int
73-
Number of repetitons for the sample splitting.
73+
Number of repetitions for the sample splitting.
7474
Default is ``1``.
7575
7676
score : str
@@ -80,7 +80,7 @@ class DoubleMLDIDBinary(LinearScoreMixin, DoubleML):
8080
Default is ``'observational'``.
8181
8282
in_sample_normalization : bool
83-
Indicates whether to use a sligthly different normalization from Sant'Anna and Zhao (2020).
83+
Indicates whether to use a slightly different normalization from Sant'Anna and Zhao (2020).
8484
Default is ``True``.
8585
8686
trimming_rule : str

doubleml/did/did_cs.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ class DoubleMLDIDCS(LinearScoreMixin, DoubleML):
3737
Default is ``5``.
3838
3939
n_rep : int
40-
Number of repetitons for the sample splitting.
40+
Number of repetitions for the sample splitting.
4141
Default is ``1``.
4242
4343
score : str
@@ -47,7 +47,7 @@ class DoubleMLDIDCS(LinearScoreMixin, DoubleML):
4747
Default is ``'observational'``.
4848
4949
in_sample_normalization : bool
50-
Indicates whether to use a sligthly different normalization from Sant'Anna and Zhao (2020).
50+
Indicates whether to use a slightly different normalization from Sant'Anna and Zhao (2020).
5151
Default is ``True``.
5252
5353
trimming_rule : str

doubleml/did/did_cs_binary.py

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,78 @@
2828

2929

3030
class DoubleMLDIDCSBinary(LinearScoreMixin, DoubleML):
31+
"""Double machine learning for difference-in-differences models with repeated cross sections (binary setting in terms of group and time
32+
combinations).
33+
34+
Parameters
35+
----------
36+
obj_dml_data : :class:`DoubleMLPanelData` object
37+
The :class:`DoubleMLPanelData` object providing the data and specifying the variables for the causal model.
38+
39+
g_value : int
40+
The value indicating the treatment group (first period with treatment).
41+
Default is ``None``. This implements the case for the smallest, non-zero value of G.
42+
43+
t_value_pre : int
44+
The value indicating the baseline pre-treatment period.
45+
46+
t_value_eval : int
47+
The value indicating the period for evaluation.
48+
49+
ml_g : estimator implementing ``fit()`` and ``predict()``
50+
A machine learner implementing ``fit()`` and ``predict()`` methods (e.g.
51+
:py:class:`sklearn.ensemble.RandomForestRegressor`) for the nuisance function :math:`g_0(d,X) = E[Y_1-Y_0|D=d, X]`.
52+
For a binary outcome variable :math:`Y` (with values 0 and 1), a classifier implementing ``fit()`` and
53+
``predict_proba()`` can also be specified. If :py:func:`sklearn.base.is_classifier` returns ``True``,
54+
``predict_proba()`` is used otherwise ``predict()``.
55+
56+
ml_m : classifier implementing ``fit()`` and ``predict_proba()``
57+
A machine learner implementing ``fit()`` and ``predict_proba()`` methods (e.g.
58+
:py:class:`sklearn.ensemble.RandomForestClassifier`) for the nuisance function :math:`m_0(X) = E[D=1|X]`.
59+
Only relevant for ``score='observational'``.
60+
61+
control_group : str
62+
Specifies the control group. Either ``'never_treated'`` or ``'not_yet_treated'``.
63+
Default is ``'never_treated'``.
64+
65+
anticipation_periods : int
66+
Number of anticipation periods. Default is ``0``.
67+
68+
n_folds : int
69+
Number of folds.
70+
Default is ``5``.
71+
72+
n_rep : int
73+
Number of repetitions for the sample splitting.
74+
Default is ``1``.
75+
76+
score : str
77+
A str (``'observational'`` or ``'experimental'``) specifying the score function.
78+
The ``'experimental'`` scores refers to an A/B setting, where the treatment is independent
79+
from the pretreatment covariates.
80+
Default is ``'observational'``.
81+
82+
in_sample_normalization : bool
83+
Indicates whether to use a slightly different normalization from Sant'Anna and Zhao (2020).
84+
Default is ``True``.
85+
86+
trimming_rule : str
87+
A str (``'truncate'`` is the only choice) specifying the trimming approach.
88+
Default is ``'truncate'``.
89+
90+
trimming_threshold : float
91+
The threshold used for trimming.
92+
Default is ``1e-2``.
93+
94+
draw_sample_splitting : bool
95+
Indicates whether the sample splitting should be drawn during initialization of the object.
96+
Default is ``True``.
97+
98+
print_periods : bool
99+
Indicates whether to print information about the evaluated periods.
100+
Default is ``False``.
101+
102+
"""
31103

32104
def __init__(
33105
self,

doubleml/irm/apo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ class DoubleMLAPO(LinearScoreMixin, DoubleML):
4646
Default is ``5``.
4747
4848
n_rep : int
49-
Number of repetitons for the sample splitting.
49+
Number of repetitions for the sample splitting.
5050
Default is ``1``.
5151
5252
score : str or callable

doubleml/irm/cvar.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ class DoubleMLCVAR(LinearScoreMixin, DoubleML):
5454
Default is ``5``.
5555
5656
n_rep : int
57-
Number of repetitons for the sample splitting.
57+
Number of repetitions for the sample splitting.
5858
Default is ``1``.
5959
6060
score : str

doubleml/irm/iivm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ class DoubleMLIIVM(LinearScoreMixin, DoubleML):
4545
Default is ``5``.
4646
4747
n_rep : int
48-
Number of repetitons for the sample splitting.
48+
Number of repetitions for the sample splitting.
4949
Default is ``1``.
5050
5151
score : str or callable

doubleml/irm/irm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ class DoubleMLIRM(LinearScoreMixin, DoubleML):
4747
Default is ``5``.
4848
4949
n_rep : int
50-
Number of repetitons for the sample splitting.
50+
Number of repetitions for the sample splitting.
5151
Default is ``1``.
5252
5353
score : str or callable

doubleml/irm/lpq.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ class DoubleMLLPQ(NonLinearScoreMixin, DoubleML):
4949
Default is ``5``.
5050
5151
n_rep : int
52-
Number of repetitons for the sample splitting.
52+
Number of repetitions for the sample splitting.
5353
Default is ``1``.
5454
5555
score : str

0 commit comments

Comments
 (0)