Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"That's quite amazing. We are not only able to estimate the ATE, but we also get, for free, confidence intervals and P-Values out of it! More than that, we can see that regression is doing exactly what it supposed to do: comparing $E[Y|T=0]$ and $E[Y|T=1]$. The intercept is exactly the sample mean when $T=0$, $E[Y|T=0]$, and the coefficient of the online format is exactly the sample difference in means $E[Y|T=1] - E[Y|T=0]$. Don't trust me? No problem. You can see for yourself:"
"That's quite amazing. We are not only able to estimate the ATE, but we also get, for free, confidence intervals and P-Values out of it! More than that, we can see that regression is doing exactly what it is supposed to do: comparing $E[Y|T=0]$ and $E[Y|T=1]$. The intercept is exactly the sample mean when $T=0$, $E[Y|T=0]$, and the coefficient of the online format is exactly the sample difference in means $E[Y|T=1] - E[Y|T=0]$. Don't trust me? No problem. You can see for yourself:"
]
},
{
Expand Down Expand Up @@ -225,8 +225,8 @@
}
],
"source": [
"kapa = data[\"falsexam\"].cov(data[\"format_ol\"]) / data[\"format_ol\"].var()\n",
"kapa"
"kappa = data[\"falsexam\"].cov(data[\"format_ol\"]) / data[\"format_ol\"].var()\n",
"kappa"
]
},
{
Expand Down Expand Up @@ -417,7 +417,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, it is not because we can estimate this simple model that it's correct. Notice how I was carefully with my words saying it **predicts** wage from education. I never said that this prediction was causal. In fact, by now, you probably have very serious reasons to believe this model is biased. Since our data didn't come from a random experiment, we don't know if those that got more education are comparable to those who got less. Going even further, from our understanding of how the world works, we are very certain that they are not comparable. Namely, we can argue that those with more years of education probably have richer parents, and that the increase we are seeing in wages as we increase education is just a reflection of how the family wealth is associated with more years of education. Putting it in math terms, we think that $E[Y_0|T=0] < E[Y_0|T=1]$, that is, those with more education would have higher income anyway, even without so many years of education. If you are really grim about education, you can argue that it can even *reduce* wages by keeping people out of the workforce and lowering their experience.\n",
"Of course, it is not because we can estimate this simple model that it's correct. Notice how I was careful with my words saying it **predicts** wage from education. I never said that this prediction was causal. In fact, by now, you probably have very serious reasons to believe this model is biased. Since our data didn't come from a random experiment, we don't know if those that got more education are comparable to those who got less. Going even further, from our understanding of how the world works, we are very certain that they are not comparable. Namely, we can argue that those with more years of education probably have richer parents, and that the increase we are seeing in wages as we increase education is just a reflection of how the family wealth is associated with more years of education. Putting it in math terms, we think that $E[Y_0|T=0] < E[Y_0|T=1]$, that is, those with more education would have higher income anyway, even without so many years of education. If you are really grim about education, you can argue that it can even *reduce* wages by keeping people out of the workforce and lowering their experience.\n",
"\n",
"Fortunately, in our data, we have access to lots of other variables. We can see the parents' education `meduc`, `feduc`, the `IQ` score for that person, the number of years of experience `exper` and the tenure of the person in his or her current company `tenure`. We even have some dummy variables for marriage and black ethnicity. "
]
Expand Down Expand Up @@ -1023,7 +1023,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Regression, on the other hand, does so by comparing the effect of $T$ while maintaining the confounder $W$ set to a fixed level. With regression, it is not the case that W cease to cause T and Y. It is just that it is held fixed, so it can't influence changes on T and Y."
"Regression, on the other hand, does so by comparing the effect of $T$ while maintaining the confounder $W$ set to a fixed level. With regression, it is not the case that W ceases to cause T and Y. It is just that it is held fixed, so it can't influence changes on T and Y."
]
},
{
Expand Down Expand Up @@ -1121,7 +1121,7 @@
"\n",
"## Key Ideas\n",
"\n",
"We've covered a lot of ground with regression. We saw how regression can be used to perform A/B testing and how it conveniently gives us confidence intervals. Then, we moved to study how regression solves a prediction problem and it is the best linear approximation to the conditional expectation function (CEF). We've also discussed how, in the bivariate case, the regression treatment coefficient is the covariance between the treatment and the outcome divided by the variance of the treatment. Expanding to the multivariate case, we figured out how regression gives us a partialling out interpretation of the treatment coefficient: it can be interpreted as how the outcome would change with the treatment while keeping all other included variables constant. This is what economists love to refer as *ceteris paribus*.\n",
"We've covered a lot of ground with regression. We saw how regression can be used to perform A/B testing and how it conveniently gives us confidence intervals. Then, we moved to study how regression solves a prediction problem and is the best linear approximation to the conditional expectation function (CEF). We've also discussed how, in the bivariate case, the regression treatment coefficient is the covariance between the treatment and the outcome divided by the variance of the treatment. Expanding to the multivariate case, we figured out how regression gives us a partialling out interpretation of the treatment coefficient: it can be interpreted as how the outcome would change with the treatment while keeping all other included variables constant. This is what economists love to refer as *ceteris paribus*.\n",
"\n",
"Finally, we took a turn to understanding bias. We saw how `Short equals long plus the effect of omitted times the regression of omitted on included`. This shed some light to how bias comes to be. We discovered that the source of omitted variable bias is confounding: a variable that affects both the treatment and the outcome. Lastly, we used causal graphs to see how RCT and regression fixes confounding.\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
"## Regression With Grouped Data\n",
"\n",
"Not all data points are created equal. If we look again at our ENEM dataset, we trust the scores of big schools much more than the scores from small schools. This is not to say that big schools are better or anything. It is just due to the fact that their big size imply less variance."
"Not all data points are created equal. If we look again at our ENEM dataset, we trust the scores of big schools much more than the scores from small schools. This is not to say that big schools are better or anything. It is just due to the fact that their big size implies less variance."
]
},
{
Expand Down Expand Up @@ -968,7 +968,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"First of all, notice how this removes any assumption about the functional form of how education affects wages. We don't need to worry about logs anymore. In essence, this model is completely non-parametric. All it does is compute sample averages of wage for each year of education. This can be seen in the plot above, where the fitted line doesn't have a particular form. Instead, is the interpolation of the sample means for each year of education. We can also see that by reconstructing one parameter, for instance, that of 17 years of education. For this model, it's `9.5905`. Below, we can see how it is just the difference between the baseline years of education (9) and the individuals with 17 years\n",
"First of all, notice how this removes any assumption about the functional form of how education affects wages. We don't need to worry about logs anymore. In essence, this model is completely non-parametric. All it does is compute sample averages of wage for each year of education. This can be seen in the plot above, where the fitted line doesn't have a particular form. Instead, it is the interpolation of the sample means for each year of education. We can also see that by reconstructing one parameter, for instance, that of 17 years of education. For this model, it's `9.5905`. Below, we can see how it is just the difference between the baseline years of education (9) and the individuals with 17 years\n",
"\n",
"$\n",
"\\beta_{17} = E[Y|T=17]-E[Y|T=9]\n",
Expand Down Expand Up @@ -1002,7 +1002,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If we include more dummy covariates in the model, the parameter on education become a weighted average of the effect on each dummy group:\n",
"If we include more dummy covariates in the model, the parameter on education becomes a weighted average of the effect on each dummy group:\n",
"\n",
"$\n",
"E\\{ \\ (E[Y_i|T=1, Group_i] - E[Y_i|T=0, Group_i])w(Group_i) \\ \\}\n",
Expand Down
22 changes: 11 additions & 11 deletions causal-inference-for-the-brave-and-true/07-Beyond-Confounders.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"\n",
"As a motivating example, let's suppose you are a data scientist in the collections team of a fintech. Your next task is to figure out the impact of sending an email asking people to negotiate their debt. Your response variable is the amount of payments from the late customers.\n",
"\n",
"To answer this question, your team selects 5000 random customers from your late customers base to do a random test. For every customer, you flip a coin, if its heads, the customer receives the email; otherwise, it is left as a control. With this test, you hope to find out how much extra money the email generates."
"To answer this question, your team selects 5000 random customers from your late customers base to do a random test. For every customer, you flip a coin, if it's heads, the customer receives the email; otherwise, it is left as a control. With this test, you hope to find out how much extra money the email generates."
]
},
{
Expand Down Expand Up @@ -1196,7 +1196,7 @@
"source": [
"## Bad Controls - Selection Bias\n",
"\n",
"Let's go back to the collections email example. Remember that the email was randomly assigned to customers. We've already explained what `credit_limit` and `risk_score` is. Now, let's look at the remaining variables. `opened` is a dummy variable for the customer opening the email or not. `agreement` is another dummy marking if the customers contacted the collections department to negotiate their debt, after having received the email. Which of the following models do you think is more appropriate? The first is a model with the treatment variable plus `credit_limit` and `risk_score`; the second adds `opened` and `agreement` dummies."
"Let's go back to the collections email example. Remember that the email was randomly assigned to customers. We've already explained what `credit_limit` and `risk_score` are. Now, let's look at the remaining variables. `opened` is a dummy variable for the customer opening the email or not. `agreement` is another dummy marking if the customers contacted the collections department to negotiate their debt, after having received the email. Which of the following models do you think is more appropriate? The first is a model with the treatment variable plus `credit_limit` and `risk_score`; the second adds `opened` and `agreement` dummies."
]
},
{
Expand Down Expand Up @@ -1484,11 +1484,11 @@
"email -> opened -> agreement -> payment \n",
"$\n",
"\n",
"We also think that different levels of risk and line have different propensity of doing an agreement, so we will mark them as also causing agreement. As for email and agreement, we could make an argument that some people just read the subject of the email and that makes them more likely to make an agreement. The point is that email could also cause agreement without passing through open.\n",
"We also think that different levels of risk and limit have different propensity of doing an agreement, so we will mark them as also causing agreement. As for email and agreement, we could make an argument that some people just read the subject of the email and that makes them more likely to make an agreement. The point is that email could also cause agreement without passing through open.\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not grammar, but keeping vocabulary consistency

"\n",
"What we notice with this graph is that opened and agreement are both in the causal path from email to payments. So, if we control for them with regression, we would be saying \"this is the effect of email while keeping `opened` and `agreement` fixed\". However, both are part of the causal effect of the email, so we don't want to hold them fixed. Instead, we could argue that email increases payments precisely because it boosts the agreement rate. If we fix those variables, we are removing some of the true effect from the email variable. \n",
"\n",
"With potential outcome notation, we can say that, due to randomization $E[Y_0|T=0] = E[Y_0|T=1]$. However, even with randomization, when we control for agreement, treatment and control are no longer comparable. In fact, with some intuitive thinking, we can even guess how they are different:\n",
"With potential outcomes notation, we can say that, due to randomization $E[Y_0|T=0] = E[Y_0|T=1]$. However, even with randomization, when we control for agreement, treatment and control are no longer comparable. In fact, with some intuitive thinking, we can even guess how they are different:\n",
"\n",
"\n",
"$\n",
Expand All @@ -1507,13 +1507,13 @@
"\n",
"![img](./data/img/beyond-conf/selection.png)\n",
"\n",
"Selection bias is so pervasive that not even randomization can fix it. Better yet, it is often introduced by the ill advised, even in random data! Spotting and avoiding selection bias requires more practice than skill. Often, they appear underneath some supposedly clever idea, making it even harder to uncover. Here are some examples of selection biased I've encountered:\n",
"Selection bias is so pervasive that not even randomization can fix it. Better yet, it is often introduced by the ill advised, even in random data! Spotting and avoiding selection bias requires more practice than skill. Often, they appear underneath some supposedly clever idea, making it even harder to uncover. Here are some examples of selection biases I've encountered:\n",
"\n",
" 1. Adding a dummy for paying the entire debt when trying to estimate the effect of a collections strategy on payments.\n",
" 2. Controlling for white vs blue collar jobs when trying to estimate the effect of schooling on earnings\n",
" 3. Controlling for conversion when estimating the impact of interest rates on loan duration\n",
" 4. Controlling for marital happiness when estimating the impact of children on extramarital affairs\n",
" 5. Breaking up payments modeling E[Payments] into one binary model that predict if payment will happen and another model that predict how much payment will happen given that some will: E[Payments|Payments>0]*P(Payments>0)\n",
"1. Adding a dummy for paying the entire debt when trying to estimate the effect of a collections strategy on payments.\n",
"2. Controlling for white vs blue collar jobs when trying to estimate the effect of schooling on earnings\n",
"3. Controlling for conversion when estimating the impact of interest rates on loan duration\n",
"4. Controlling for marital happiness when estimating the impact of children on extramarital affairs\n",
"5. Breaking up payments modeling $E[Payments]$ into one binary model that predicts if payment will happen and another model that predict how much payment will happen given that some will: $E[Payments|Payments>0]*P(Payments>0)$\n",
Comment on lines +1512 to +1516
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current rendering makes this a code block. Removing tabs makes it text. Tested with a local build: jupyter-book build causal-inference-for-the-brave-and-true

" \n",
"What is notable about all these ideas is how reasonable they sound. Selection bias often does. Let this be a warning. As a matter of fact, I myself have fallen into the traps above many many times before I learned how bad they were. One in particular, the last one, deserves further explanation because it looks so clever and catches lots of data scientists off guard. It's so pervasive that it has its own name: **The Bad COP**!\n",
"\n",
Expand Down Expand Up @@ -1595,7 +1595,7 @@
"\\end{align*} \n",
"$$\n",
" \n",
"where the second equality comes after we add and subtract $E[Y_{i0}|Y_{i1}>0]$. When we break up the COP effect, we get first the causal effect on the participant subpopulation. In our example, this would be the causal effect on those that decide to spend something. Second, we get a bias term which is the difference in $Y_0$ for those that decide to participate when assigned to the treatment ($E[Y_{i0}|Y_{i1}>0]$) and those that that participate even without the treatment ($E[Y_{i0}|Y_{i0}>0]$). In our case, this bias is probably negative, since those that spend when assigned to the treatment, had they not received the treatment, would probably spend less than those that spend even without the treatment $E[Y_{i0}|Y_{i1}>0] < E[Y_{i0}|Y_{i0}>0]$.\n",
"where the second equality comes after we add and subtract $E[Y_{i0}|Y_{i1}>0]$. When we break up the COP effect, we get first the causal effect on the participant subpopulation. In our example, this would be the causal effect on those that decide to spend something. Second, we get a bias term which is the difference in $Y_0$ for those that decide to participate when assigned to the treatment ($E[Y_{i0}|Y_{i1}>0]$) and those that would participate even without the treatment ($E[Y_{i0}|Y_{i0}>0]$). In our case, this bias is probably negative, since those that spend when assigned to the treatment, had they not received the treatment, would probably spend less than those that spend even without the treatment $E[Y_{i0}|Y_{i1}>0] < E[Y_{i0}|Y_{i0}>0]$.\n",
" \n",
"![img](./data/img/beyond-conf/cop.png)\n",
" \n",
Expand Down