Skip to content

himanalot/DRIFT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DRIFT: Drug-Drug Interaction via Foundation Transformers

Lightweight Multi-View Molecular Representations for Generalizable DDI Prediction

Paper License


Key Results

DRIFT combines frozen ChemBERTa-2 embeddings (384d) with Morgan fingerprints (2048-bit) through a simple MLP. No knowledge graphs, no GNNs, no expensive pretraining. Trains in 11 minutes on a single GPU.

DDI-Bench: Unseen Drug Generalization (New SOTA)

Model Type S1 F1 S2 F1
DRIFT Frozen Embeddings + MLP 62.1 +/- 0.9 30.1 +/- 1.9
DDI-GPT LLM 57.3 +/- 1.4 18.8 +/- 2.2
EmerGNN GNN 56.9 +/- 1.7 22.5 +/- 1.2
TextDDI LLM 56.8 +/- 0.8 18.2 +/- 0.2

+4.8pp over DDI-GPT on S1, +7.6pp over EmerGNN on S2.

Standard DrugBank Benchmark (New SOTA, same SumGNN/KnowDDI split)

Model ACC F1 (macro)
DRIFT 96.25 +/- 0.06% 93.14 +/- 1.09%
KnowDDI 93.17 +/- 0.09% 91.53 +/- 0.24%
SumGNN 90.34 +/- 0.28% 86.88 +/- 0.63%

+3.1pp ACC over KnowDDI, using only SMILES-derived features.

vs. MMFF-DDI (IEEE TCBB 2026, same 5-fold CV splits)

DRIFT matches MMFF-DDI's multi-modal framework (EGNN + MolFormer + attention autoencoder + contrastive learning) on S1 generalization with just a frozen-embedding MLP.


Architecture

Drug A SMILES --> ChemBERTa-2 (frozen, 384d) --> emb_A --+
                 Morgan FP (2048-bit binary)  --> fp_A  --+-> h_A = [emb_A || fp_A]
                                                          |
Drug B SMILES --> ChemBERTa-2 (frozen, 384d) --> emb_B --+
                 Morgan FP (2048-bit binary)  --> fp_B  --+-> h_B = [emb_B || fp_B]
                                                          |
           Pair: [h_A || h_B || |h_A - h_B| || h_A * h_B] (9728d)
                                    |
                       MLP: 9728 -> 512 -> 256 -> 86
                       (BatchNorm + ReLU + Dropout 0.2)
  • 5.1M trainable parameters (MLP head only)
  • ~11 min training time on NVIDIA A6000
  • Frozen embeddings -- no backprop through ChemBERTa

Quick Start

Install

pip install torch transformers rdkit scikit-learn tqdm

Extract Embeddings (one-time, ~7 seconds)

cd src
python embed.py --dataset drugbank

Train

# DDI-Bench (S0/S1/S2 inductive splits)
python train.py --dataset drugbank --split_type random \
    --feature_type hybrid_2048 --pair_mode full \
    --hidden_dims 512 256 --dropout 0.2 --lr 3e-4 \
    --batch_size 256 --epochs 200 --patience 20

# Standard DrugBank (KnowDDI/SumGNN split)
python train_standard.py --benchmark drugbank_knowddi \
    --feature_type hybrid_2048 --pair_mode full \
    --hidden_dims 512 256 --dropout 0.2 --lr 3e-4 \
    --batch_size 256 --epochs 200 --patience 20

Project Structure

src/
  train.py           # DDI-Bench training (S0/S1/S2 inductive splits)
  train_standard.py  # Standard benchmark training (70/10/20 splits)
  model.py           # DRIFT_MLP architecture
  dataset.py         # Data loading with pair feature construction
  embed.py           # ChemBERTa + Morgan FP extraction
  config.py          # Hyperparameters

data/
  DDI-Bench/         # Official DDI-Bench splits (clone separately)
  drugbank_knowddi/  # SumGNN/KnowDDI canonical split
  drugbank_standard/ # Standard 70/10/20 split
  dengs/             # Deng et al. dataset

paper/
  drift_paper.pdf    # Full paper
  drift_paper.tex    # LaTeX source

DDI-Bench Data

Clone the official DDI-Bench repository:

cd data
git clone https://github.com/LARS-research/DDI-Bench.git

Citation

@article{ramrakhiani2026drift,
  title={DRIFT: Drug-Drug Interaction via Foundation Transformers},
  author={Ramrakhiani, Ishan},
  year={2026},
  note={Shepherd Healthcare}
}

License

MIT

About

DRIFT: Drug-Drug Interaction via Foundation Transformers. New SOTA on DDI-Bench and standard DrugBank benchmarks using frozen pretrained embeddings + simple MLP.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages