Lightweight Multi-View Molecular Representations for Generalizable DDI Prediction
DRIFT combines frozen ChemBERTa-2 embeddings (384d) with Morgan fingerprints (2048-bit) through a simple MLP. No knowledge graphs, no GNNs, no expensive pretraining. Trains in 11 minutes on a single GPU.
| Model | Type | S1 F1 | S2 F1 |
|---|---|---|---|
| DRIFT | Frozen Embeddings + MLP | 62.1 +/- 0.9 | 30.1 +/- 1.9 |
| DDI-GPT | LLM | 57.3 +/- 1.4 | 18.8 +/- 2.2 |
| EmerGNN | GNN | 56.9 +/- 1.7 | 22.5 +/- 1.2 |
| TextDDI | LLM | 56.8 +/- 0.8 | 18.2 +/- 0.2 |
+4.8pp over DDI-GPT on S1, +7.6pp over EmerGNN on S2.
| Model | ACC | F1 (macro) |
|---|---|---|
| DRIFT | 96.25 +/- 0.06% | 93.14 +/- 1.09% |
| KnowDDI | 93.17 +/- 0.09% | 91.53 +/- 0.24% |
| SumGNN | 90.34 +/- 0.28% | 86.88 +/- 0.63% |
+3.1pp ACC over KnowDDI, using only SMILES-derived features.
DRIFT matches MMFF-DDI's multi-modal framework (EGNN + MolFormer + attention autoencoder + contrastive learning) on S1 generalization with just a frozen-embedding MLP.
Drug A SMILES --> ChemBERTa-2 (frozen, 384d) --> emb_A --+
Morgan FP (2048-bit binary) --> fp_A --+-> h_A = [emb_A || fp_A]
|
Drug B SMILES --> ChemBERTa-2 (frozen, 384d) --> emb_B --+
Morgan FP (2048-bit binary) --> fp_B --+-> h_B = [emb_B || fp_B]
|
Pair: [h_A || h_B || |h_A - h_B| || h_A * h_B] (9728d)
|
MLP: 9728 -> 512 -> 256 -> 86
(BatchNorm + ReLU + Dropout 0.2)
- 5.1M trainable parameters (MLP head only)
- ~11 min training time on NVIDIA A6000
- Frozen embeddings -- no backprop through ChemBERTa
pip install torch transformers rdkit scikit-learn tqdmcd src
python embed.py --dataset drugbank# DDI-Bench (S0/S1/S2 inductive splits)
python train.py --dataset drugbank --split_type random \
--feature_type hybrid_2048 --pair_mode full \
--hidden_dims 512 256 --dropout 0.2 --lr 3e-4 \
--batch_size 256 --epochs 200 --patience 20
# Standard DrugBank (KnowDDI/SumGNN split)
python train_standard.py --benchmark drugbank_knowddi \
--feature_type hybrid_2048 --pair_mode full \
--hidden_dims 512 256 --dropout 0.2 --lr 3e-4 \
--batch_size 256 --epochs 200 --patience 20src/
train.py # DDI-Bench training (S0/S1/S2 inductive splits)
train_standard.py # Standard benchmark training (70/10/20 splits)
model.py # DRIFT_MLP architecture
dataset.py # Data loading with pair feature construction
embed.py # ChemBERTa + Morgan FP extraction
config.py # Hyperparameters
data/
DDI-Bench/ # Official DDI-Bench splits (clone separately)
drugbank_knowddi/ # SumGNN/KnowDDI canonical split
drugbank_standard/ # Standard 70/10/20 split
dengs/ # Deng et al. dataset
paper/
drift_paper.pdf # Full paper
drift_paper.tex # LaTeX source
Clone the official DDI-Bench repository:
cd data
git clone https://github.com/LARS-research/DDI-Bench.git@article{ramrakhiani2026drift,
title={DRIFT: Drug-Drug Interaction via Foundation Transformers},
author={Ramrakhiani, Ishan},
year={2026},
note={Shepherd Healthcare}
}MIT