You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-3Lines changed: 11 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -45,7 +45,13 @@ DeepXML supports multiple feature architectures such as Bag-of-embedding/Astec,
45
45
```txt
46
46
* Download the (zipped file) BoW features from XML repository.
47
47
* Extract the zipped file into data directory.
48
-
* The following files should be available in <work_dir>/data/<dataset>
48
+
* The following files should be available in <work_dir>/data/<dataset> for new datasets (ignore the next step)
49
+
- trn_X_Xf.txt
50
+
- trn_X_Y.txt
51
+
- tst_X_Xf.txt
52
+
- tst_X_Y.txt
53
+
- fasttextB_embeddings_300d.npy or fasttextB_embeddings_512d.npy
54
+
* The following files should be available in <work_dir>/data/<dataset> if the dataset is in old format (please refer to next step to convert the data to new format)
49
55
- train.txt
50
56
- test.txt
51
57
- fasttextB_embeddings_300d.npy or fasttextB_embeddings_512d.npy
@@ -89,8 +95,8 @@ An ensemble can be trained as follows. A json file is used to specify architectu
89
95
90
96
* framework
91
97
- DeepXML: Divides the XML problems in 4 modules as proposed in the paper.
92
-
- DeepXML-OVA: Train the method in 1-vs-all fashion [4][5], i.e., loss is computed for each label in each iteration.
93
-
- DeepXML-ANNS: Train the method using a label shortlist. Support is available for a fixed graph or periodic training of the ANNS graph.
98
+
- DeepXML-OVA: Train the architecture in 1-vs-all fashion [4][5], i.e., loss is computed for each label in each iteration.
99
+
- DeepXML-ANNS: Train the architecture using a label shortlist. Support is available for a fixed graph or periodic training of the ANNS graph.
94
100
95
101
* dataset
96
102
- Name of the dataset.
@@ -117,6 +123,8 @@ An ensemble can be trained as follows. A json file is used to specify architectu
117
123
* Other file formats such as npy, npz, pickle are also supported.
118
124
* Initializing with token embeddings (computed from FastText) leads to noticible accuracy gain in Astec. Please ensure that the token embedding file is available in data directory, if 'init=token_embeddings', otherwise it'll throw an error.
119
125
* Config files are made available in deepxml/configs/<framework>/<method> for datasets in XC repository. You can use them when trying out Astec/DeepXML on new datasets.
126
+
* We conducted our experiments on a 24-core Intel Xeon 2.6 GHz machine with 440GB RAM with a single Nvidia P40 GPU. 128GB memory should suffice for most datasets.
127
+
* Astec make use of CPU (mainly for nmslib) as well as GPU.
0 commit comments