-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathlogistic_regression_example.Rmd
More file actions
77 lines (54 loc) · 1.97 KB
/
logistic_regression_example.Rmd
File metadata and controls
77 lines (54 loc) · 1.97 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
title: "logistic regression demo"
author: "Clay Ford"
date: "6/9/2021"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Generate data
Let's generate fake data for two groups A and B. If you're in group A, chance of hemetoma is 30%. If you're in group B, chance of hematoma is 50%.
```{r}
grp <- rep(c("A", "B"), each = 50)
set.seed(1)
hematoma <- rbinom(n = 100, size = 1,
prob = rep(c(0.3, 0.5), each = 50))
```
## Examine data
Create a cross tabulation. Hematoma = 1 means the subject experienced hematoma.
```{r}
table(grp, hematoma)
```
Get proportions. It appears group B has a higher chance of hematoma (0.42 vs 0.32).
```{r}
prop.table(table(grp, hematoma), margin = 1)
```
Calculating odds ratio can be done "by hand"
```{r}
# save table of proportions
tab <- prop.table(table(grp, hematoma), margin = 1)
grpA_odds <- tab[1,2]/(1 - tab[1,2])
grpB_odds <- tab[2,2]/(1 - tab[2,2])
# odds ratio
grpB_odds/grpA_odds
```
Appears group B odds for hematoma are about 54% higher than the odds of hematoma for group A.
## Logistic regression for CI and p-value
We can use logistic regression to get odds ratio, confidence interval, and the p-value testing null if odds ratio is 1. To do this we use the `glm` function with `family = binomial`.
```{r}
m <- glm(hematoma ~ grp, family = binomial)
```
Odds ratio is same as we got by hand. Need to exponentiate the coefficient. Some statistics programs return this automatically.
```{r}
exp(coef(m)["grpB"])
```
Confidence interval on odds ratio:
```{r}
exp(confint(m))["grpB",]
```
The p-value:
```{r}
coef(summary(m))["grpB","Pr(>|z|)"]
```
It appears the supposition of no difference in the odds cannot be rejected based on such a large p-value. We generated the data and know there is a real difference, but our analysis of the sample missed it. This is a Type II error: failed to reject null when there was indeed an effect. In real life we don't know if we make these errors.