-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmain.Rmd
105 lines (76 loc) · 2.15 KB
/
main.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# main module & framework for bootstrap & crossval
## get all the data
Note: the data has been amended so that class labels M = 1, and B = 0.
```{r}
data = read.csv("complete_dataset.csv")
trash = data[,1]
X = data[3:length(data)]
y = data[,2]
```
## split into test and train
```{r}
p = 0.85
keep = runif(length(length(data[,1]))) < p
X_train = X[keep,]
X_test = X[!keep,]
y_train = y[keep]
y_test = y[!keep]
```
## bootstrapping framework
```{r}
bootstrap_framework = function(X_train, y_train) {
# constants
p = 0.85
iterations = 1000
models = 10
results = matrix(0, models, iterations)
k_fold = 5
train_length = length(X_train[,1])
fold_size = as.integer(train_length / k_fold)
for(i in 1:iterations) {
# generate bootstrapped sample
bootstrap_ind = sample(train_length,train_length,replace=TRUE)
X_boot = X_train[bootstrap_ind,]
y_boot = y_train[bootstrap_ind]
# TODO:
# store whatever values we need here
# both for *saving* results, but also different *values of params*
for(k in 1:k_fold) {
## split data into train (X_tr, y_tr) and validation (X_val, y_val)
test_i = ((k - 1) * fold_size): ((k) * fold_size)
keep = (1:length(X_boot[,1])) * 0 + 1
keep[test_i] = 0
X_tr = X_boot[keep,]
y_tr = y_boot[keep]
X_val = X_boot[!keep,]
y_val = y_boot[!keep]
## TODO: add models to test below
## TODO: store results into result matrix
}
}
## TODO: generate charts, save results, return results or print results
## TODO: determine best model
}
```
## run bootstrap_framework on unpreprocessed data
```{r}
bootstrap_framework(X_train, y_train)
```
## run bootstrap_framework on lasso predictors
```{r}
bootstrap_framework(X_train, y_train)
```
## run bootstrap_framework on pca (variance = 90)
```{r}
bootstrap_framework(X_train, y_train)
```
## run bootstrap_framework on pca (variance = 95)
```{r}
bootstrap_framework(X_train, y_train)
```
## run bootstrap_framework on pca (variance = 99)
```{r}
bootstrap_framework(X_train, y_train)
```
## extra
## figure out semi-supervised learning