-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdatastructure.Rmd
162 lines (138 loc) · 4.91 KB
/
datastructure.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
title: "Data structure and Characteristics"
author: "Matteo Delucchi"
output:
rmarkdown::html_document:
toc: yes
toc_float:
collapsed: true
smooth_scroll: false
toc_depth: 3
rmarkdown::html_vignette: default
vignette: >
%\VignetteIndexEntry{Data structure and Characteristics}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
# Clear working environment
rm(list=ls())
# Load libraries
library(bnaiaR)
library(dplyr)
library(ggplot2)
library(kableExtra)
# Save plots as files?
# Warning: Bad layout with knitr. Works well if chunks are run without rendering.
SAVEPLOTS <- FALSE
PLOTPATH <- Sys.getenv("PLOTPATH")
```
# Load Preprocessed Data
```{r}
str(adb.raw)
```
# Characteristics of whole, harmonized data cohort
no. of patients
```{r}
nrow(adb.raw)
```
how many had a form of stroke as reason for diagnosis?
```{r}
table(adb.raw$Basis.of.recruitment_CaseControl, useNA = "always")
```
median age at diagnosis?
```{r}
summary(adb.raw$age.at.time.of.diagnosis)
```
median age at rupture?
```{r}
summary(adb.raw$age.at.time.of.diagnosis[which(adb.raw$Ruptured_IA=="Yes")])
```
Compare to Morel et al. 2021:
Median age at aneurysm rupture was 52 years (range 10-92),
25% of patients were <44 and 75% were <61 years of age.
# Variable Description
```{r}
abndata <- exp11_dat$abndata %>%
# Clean up node labels
rename("Sex" = "Gender",
"Age at Diagnosis" = "AgeDiag.group",
# "Age at Diagnosis" = "AgeDiag",
"Pos. Fam. History" = "Positive.famillial.history",
"Smoking Status" = "Smoking_Current_Former_No",
# "Smoking Status=Former" = "Smoking_Current_Former_No=Former",
# "Smoking Status=No" = "Smoking_Current_Former_No=No",
"IA Location" = "location.grouped",
# "IA Location=Low risk" = "location.grouped=Low",
# "IA Location=Medium risk" = "location.grouped=Medium",
# "IA Location=High risk" = "location.grouped=High",
"Multiple IAs" = "Multiple.IAs",
# "IA Size" = "IAsize_log",
"IA Size" = "IAsize.groups.merged",
"Ruptured IA" = "Ruptured_IA")
str(abndata)
```
```{r}
text_tbl <- data.frame(
variables = c(
"Sex",
"Age",
"",
"Family History",
"Hypertension",
"Smoking",
"Multiplicity",
"Location",
"Size",
"",
"Ruptured"),
modvars = c(
"discrete",
"discrete",
"continuous",
"discrete",
"discrete",
"discrete",
"discrete",
"discrete",
"continuous",
"discrete",
"discrete"
),
descr = c(
"Biological, self-reported sex as \textit{female} or \textit{male}.",
"Age in years at the time of IA diagnosis. Equal to age at time of rupture for ruptured IA.",
"Age discretized in groups of A=20-39, B=40-44, C=45-49, D=50-54, E=55-59, F=60-64, G=65-93.",
"Positive familial history of intracranial aneurysm as \textit{yes} if one or more first degree relative(s) were diagnosed with IA and \textit{no} otherwise.",
"\textit{No} hypertension awareness if patient is unaware of increased blood pressure (>120/80mmHg). \textit{Yes} if patient is aware of increased blood pressure. Including untreated and treated BP >140/90mmHg, with the later including both, cases of uncontrolled and controlled BP where the BP is above or in normal range respectively.",
"Patient's smoking behaviour classified as \textit{current} smoker if smoked (> 300 cigarettes) and continues current smoking. \textit{Former} smoker, if smoked (>300 cigarettes) and stopped at least 6 months pre-diagnosis. \textit{Never} smoked if < 300 cigarettes smoked ever.",
"\textit{Yes} if >1 IA was detected.",
"IA Location classified by rupture risk in groups of \textit{high} risk (Acom, Pcom, V-B, A2, PC), \textit{medium} risk (MCA, ICA, Basilar, Other, A1 segment ant) and \textit{low} risk (CavICA, OphtICA).",
"Natural logarithm of IA max diameter [mm] reported at time of diagnosis.",
"IA max diameter [mm] grouped in in \textit{A}≤ 7mm, \textit{B}=7-12mm, \textit{C}=13-25mm, \textit{D}≥25mm.",
"\textit{Yes} if IA was ruptured at time of diagnosis and \textit{no} otherwise.")
)
```
```{r}
text_tbl %>%
rename("Factor" = "variables",
"Type" = "modvars",
"Description" = "descr") %>%
kbl(format = "latex", booktabs = T) %>%
# kable_classic(full_width = F) %>%
kable_styling(latex_options=c("striped", "hold_position"),
stripe_index = c(1,4,6,8,11)) %>%
column_spec(3, width = "10cm") %>%
# kableExtra::landscape() %>%
cat(., file = paste0(PLOTPATH, "/vardesc_table.tex"))
kableExtra::pack_rows("Age", start_row = 4, end_row = 5, bold = F) %>%
kableExtra::pack_rows("Size", start_row = 10, end_row = 11, bold = F)
column_spec(1, bold = T, border_right = T) %>%
column_spec(2, width = "30em", background = "yellow")
```