This repository contains input dataset files that can be used with MurTree and pymurtree.
The datasets are benchmarks from other papers, binarised with the continuousToCategorical.r R script included in this repository. Each subdirectory contains data from one paper. The papers are as follows:
-
DL - Aglin, Gaël; Siegfried Nijssen; Pierre Schaus. "Learning optimal decision trees using caching branch-and-bound search." (AAAI'20).
-
NL - Verwer, Sicco; Yingqian Zhang. "Learning optimal classification trees using a binary linear program formulation." (AAAI'19).
-
Nina - Narodytska, Nina; Ignatiev, Alexey; Pereira, Filipe; Marques-Silva, Joao. "Learning Optimal Decision Trees with SAT." (IJCAI'18).
-
Hu - Hu, Xiyang; Cynthia Rudin; Margo Seltzer. "Optimal sparse decision trees." (NeurIPS'19).
To use these datasets, clone this repository or download the desired files, and specify the file paths as input when running MurTree.
For example, to run MurTree using the dataset in anneal.txt:
git clone [email protected]:MurTree/murtree-data.git
./murtree -file murtree-data/DL/anneal.txt -max-depth 4 -max-num-nodes 15 -time 600
-
The data files are in plain text format.
-
Each line is one instance. The first number is the label while the rest are features.
-
All features zero-one. Labels are integers starting from zero. Non-binarised instances must be converted into binary form, e.g., using the binarisation script continuousToCategorical.r included in this repository.
This repository is licensed under the MIT License. See the LICENSE file for details.