Could you please add the script/code for PCA and K-means clustering on the training dataset?
From the paper: "For all public datasets, compounds were clustered into five clusters using K-means based on a PCA-reduced 2048-bit Morgan circular fingerprint (radius 2), and one of the clusters is selected as a test set with the remaining four used as a training set."
I just wanted to make sure the process is as reproducible as possible.