MT-TopLap

Model Architecture

The multitask deep learning architecture of MT-TopLap model is shown below.

Getting Started

Prerequisites

fair-esm 2.0.0
numpy 1.23.5
scipy 1.11.3
torch 2.1.1
pytorch-cuda 11.7
scikit-learn 1.3.2
python 3.10.12
Softwares to be installed for ./bin folder (See README at bin)

Multi-task Pre-training Datasets

Dataset	No. of samples	PDB ID
RBD-ACE2-1	3669	6M0J
RBD-ACE2-2	1539	6M0J
RBD-ACE2-3	2223	6M0J
RBD-CTC-455.2-1	1539	7KL9
RBD-CTC-455.2-2	2831	7KL9
BA.1-RBD-ACE2	3800	7T9L
BA.2-RBD-ACE2	3686	7XB0
CAPRI	1862	3R2X;4EEF

10-fold Cross-Validation Benchmarking

Dataset	No. of Samples	No. of PPIs	PDB Source
Original	8338	319	Download from RCSB
AlphaFold3	8330	317	Download from AlphaFold Server

Downstream DMS Tasks

Dataset	No. of samples	PDB ID
RBD-hACE2	3649	6M0J
RBD-cACE2	3625	7C8D
RBD-bACE2	3646	7XA7
RBD-dACE2	3487	8HG0
BA.2-RBD-hACE2	3668	7ZF7
BA.2-RBD-haACE2	3724	7YV8

The PDB files, mutation locations and mutation-induced binding free energy changes can be found in ./downstream/.

Feature generation

BLAST+ PSSM calculation

# Generate PSSM scoring matrix (Requires BLAST+ 2.10.1 and GCC 9.3.0)
python PPIprepare.py <PDB ID> <Partner A chains> <Partner B chains> <Mutation chain> <Wild Residue> <Residue ID> <Mutant Residue> <pH>

# examples
python PPIprepare.py 1A4Y A B A D 435 A 7.0

MIBPB calculation

Refer to https://weilab.math.msu.edu/MIBPB/

Persistent Laplacian and Auxiliary Features

# Generate persistent homology and auxiliary features
python PPIfeature.py <PDB ID> <Partner A chains> <Partner B chains> <Mutation chain> <Wild Residue> <Residue ID> <Mutant Residue> <pH>
python PPIfeature_Lap.py <PDB ID> <Partner A chains> <Partner B chains> <Mutation chain> <Wild Residue> <Residue ID> <Mutant Residue> <pH>

# examples
python PPIfeature.py 1A4Y A B A D 435 A 7.0
python PPIfeature_Lap.py 1A4Y A B A D 435 A 7.0

ESM Features

# Generate transformer features
python PPIfeature_seq.py <PDB ID> <Partner A chains> <Partner B chains> <Mutation chain> <Wild Residue> <Residue ID> <Mutant Residue> <pH>

# examples
python PPIfeature_seq.py 1A4Y A B A D 435 A 7.0

The jobs folder contains the codes used to run feature generation process in a step-by-step procedure.

Validation Results

Method	R_p	Description	PDB Source
MT-TopLap	0.88	Freeze Last 3 Layers	RCSB
MT-TopLap^E	0.88	Freeze Even Layers	RCSB
MT-TopLap^O	0.88	Freeze Odd Layers	RCSB
MT-TopLap_AF3	0.86	Freeze Last 3 Layers	AlphaFold3
MT-TopLap^F	0.85	Freeze First 3 Layers	RCSB

Execute Model

usage: PPI_multitask_DMS_mega.py [-h] [--batch_size BATCH_SIZE] [--epochs EPOCHS] [--lr LR] [--momentum MOMENTUM]
                                 [--weight_decay WEIGHT_DECAY] [--no_cuda] [--seed SEED] [--log_interval LOG_INTERVAL]
                                 [--layers LAYERS] [--continue_train CONTINUE_TRAIN] [--prediction PREDICTION]
                                 [--pred PRED] [--cv CV] [--cv_type CV_TYPE] [--model MODEL]
                                 [--normalizer1_name NORMALIZER1_NAME] [--normalizer2_name NORMALIZER2_NAME]
                                 [--freeze FREEZE] [--skempi_pretrain SKEMPI_PRETRAIN] [--ft FT] [--finetune FINETUNE]
                                 [--debug DEBUG] [--DMS_data DMS_DATA] [--pred_DMS PRED_DMS]

options:
  -h, --help            show this help message and exit
  --batch_size BATCH_SIZE
                        input batch size for training (default: 50)
  --epochs EPOCHS       number of epochs to train (default: 500)
  --lr LR               learning rate (default: 0.001)
  --momentum MOMENTUM   SGD momentum (default: 0.9)
  --weight_decay WEIGHT_DECAY
                        SGD weight decay (default: 0)
  --no_cuda             disables CUDA training
  --seed SEED           random seed (default: 1)
  --log_interval LOG_INTERVAL
                        how many batches to wait before logging training status
  --layers LAYERS       neural network layers and neural numbers
  --continue_train CONTINUE_TRAIN
                        run training
  --prediction PREDICTION
                        prediction
  --pred PRED
  --cv CV               cv (launch validation test)
  --cv_type CV_TYPE     skempi2 or alphafold (select original PDB or AF3 validation)
  --model MODEL         prediction model
  --normalizer1_name NORMALIZER1_NAME
                        mega dataset
  --normalizer2_name NORMALIZER2_NAME
                        Lap and ESM dataset
  --freeze FREEZE       freeze weights and biases of hidden layers
  --skempi_pretrain SKEMPI_PRETRAIN
                        finetune DMS with SKEMPI2
  --ft FT               finetune pretrained model with DMS
  --finetune FINETUNE   finetune model with DMS data
  --debug DEBUG         debugging channel
  --DMS_data DMS_DATA   predict full DMS data
  --pred_DMS PRED_DMS   predict full DMS data

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this code in your work, please cite our work.

JunJie Wee, Guo-Wei Wei. Evaluation of AlphaFold 3’s Protein–Protein Complexes for Predicting Binding Free Energy Changes upon Mutation. Journal of Chemical Information and Modeling (2024). DOI: 10.1021/acs.jcim.4c00976
JunJie Wee, Jiahui Chen, Guo-Wei Wei. Preventing future zoonosis: SARS-CoV-2 mutations enhance human–animal cross-transmission. Computers in Biology and Medicine 182 (2024). 109101. DOI: 10.1016/j.compbiomed.2024.109101

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MT-TopLap

Model Architecture

Getting Started

Prerequisites

Multi-task Pre-training Datasets

10-fold Cross-Validation Benchmarking

Downstream DMS Tasks

Feature generation

BLAST+ PSSM calculation

MIBPB calculation

Persistent Laplacian and Auxiliary Features

ESM Features

Validation Results

Execute Model

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
AF3_SARS-CoV-2		AF3_SARS-CoV-2
alphafold		alphafold
bin		bin
code		code
downstream		downstream
jobs		jobs
model		model
skempi2		skempi2
MT-TopLap.png		MT-TopLap.png
MT-TopLap_dark.png		MT-TopLap_dark.png
PPI_multitask_DMS_mega.py		PPI_multitask_DMS_mega.py
README.md		README.md

ExpectozJJ/MT-TopLap

Folders and files

Latest commit

History

Repository files navigation

MT-TopLap

Model Architecture

Getting Started

Prerequisites

Multi-task Pre-training Datasets

10-fold Cross-Validation Benchmarking

Downstream DMS Tasks

Feature generation

BLAST+ PSSM calculation

MIBPB calculation

Persistent Laplacian and Auxiliary Features

ESM Features

Validation Results

Execute Model

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages