You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Character prediction mode rather than numeric feature mode. This will create test cases by iterating through the data skipSize at a time, and making the previous `sequenceLength` items have higher weights based on the closeness to the current item being predicted.s
34
+
-data string
35
+
Training data input file
36
+
-folds int
37
+
How many subdivisions of the dataset to make for cross-validation (default 5)
38
+
-m int
39
+
Override calculation for feature split size (little m)
40
+
-max int
41
+
Stop predicting after this many rounds (-pred only)
42
+
-model string
43
+
Load a pretrained model for prediction
44
+
-pred
45
+
Make a prediction
46
+
-profile string
47
+
[cpu|mem] enable profiling
48
+
-save string
49
+
Where to save the model after training
50
+
-seed string
51
+
Predict based on this string of data
52
+
-seqlen int
53
+
Normally equal to the number of variables during -charmode, override for fewer previous look-behind-memory-variables in every input test cases
54
+
-skipsize int
55
+
During -charmode, how many items to skip before making another training case (default 3)
56
+
-subsetpct float
57
+
Percent of the dataset which should be used to train a tree (always minus 1 fold for cross-validation) (default 0.6)
58
+
-tojson
59
+
Convert a model to json
60
+
-train
61
+
Train a model
62
+
-trees int
63
+
How many decision trees to make per fold of the dataset (default 1)
64
+
```
65
+
66
+
## experimental character mode
67
+
68
+
There is an experimental `-charmode` flag that attempts to encode strings of text and make predictions on it, like you would with a neural network.
6
69
7
70
## how it works
8
71
72
+

73
+
9
74
Given a data set, rows of input features x, where the last column is the expected category y.
10
75
Often these are encoded in CSV format. The data should be encoded to float32 parseable values.
11
76
@@ -32,3 +97,15 @@ To do it, start by splitting the whole dataset into equal bags (or folds) withou
32
97
For example, say there are 20 samples and we want 4 folds. Each fold will have 5 samples, and none of the 20 samples will be repeated across all the folds. However, they need to be put randomly into the folds (random without replacement).
33
98
34
99
Next, loop through all the folds. The fold in the loop iteration will be the test set, so reserve it for later. Use all the other folds to train a set of decision trees. In our example above, that means on the first fold, we would use the last 3 for training, on the second, use the first fold and the last two for training, etc. For every training set, construct decision trees that best predicts it.
0 commit comments