Graph_pKa are graph-based models trained on a custom dataset generated from high-throughput molecular dynamics simulations using the advanced polarizable AMOEBA force field to predict pKa values of four ionizable protein residues: Asp, Lys, Glu, and His.
This repository contains the complete implementation of our paper:
Graph-Based Deep Learning Models for Predicting pKa Values of Protein-Ionizable Residues via Physically Inspired Feature Engineering, https://pubs.acs.org/doi/10.1021/acs.jcim.5c01681
conda env create -f environment.yml
conda activate pKaor
pip install -r requirements.txtThe processed dataset for protein residues from PKAD-2 generated during simulations are provided in /PKAD_Data/
All models for the three architectures (GCN, GIN, and GAT) obtained during the hyperparameter grid search and trained on five datasets (with different radii) are provided in this repository.
You can also run the grid search training from scratch.
cd Net
python create_data.pycd GNN_Grid_Search
python GAT.pyNote: Although Conda provides a Tinker 8.11.3 package installation via:
conda install bioconda::tinkerthe packaged version contains known source-code issues that result in invalid .uind files (induced dipole moment files), which are required files for model inference.
In addition, Tinker 8.11.3 is no longer available. Therefore, a newer Tinker release (Tinker_EM.py has been updated for Tinker 25.5.3) must be downloaded and compiled from TinkerTools.
Detailed compilation instructions are available here.
Raw PDB files are required in ../Graph_pKa/Data/0_Raw_PDB
then call:
python Tinker_EM.pypython Tinker_Output_Processing.pypython Predict.pyFor any questions regarding the code or the papaer, please feel free to contact: songziyu0220@gmail.com or zuyi.huang@villanova.edu