Chemistry with TensorFlow (and OpenVINO)

Using TensorFlow to model chemistry problems.

An example of predicting lipophilicity from the molecular formula (SMILES).

This notebook is based on the excellent Kaggle tutorial from Vlad Kisin. In this example, you'll learn how to read a Chemistry datafile and create predictive models of lipophilicity.

Lipophilicity is the ability of a chemical compound to dissolve in non-polar (fatty or oily) solvents. In simple terms, if you had a glass of oil and water (which will separate with one on top of the other as in the figure above), then lipophilicity is the proportion of how much a chemical dissolves in the water portion versus the oil portion. In the figure there are 3 molecules in water to every 1 molecule in oil. P is 3 and the log P is $\log_{10}{3} = 0.477$.

Lipophilicity contributes to the absorption, distribution, metabolism, excretion, and toxicity of a pharmaceutical and contributes to a drug's potency and selectivity.

I'll demonstrate how to load the raw data from a CSV file and use the RD-Kit and Mol2Vec packages to create features based on the chemical formula of a molecule.

Installation

I tested this on Ubuntu 18.04 and the Anaconda Python Distribution. To setup the conda environment (which I labeled chem):

conda create -n chem python=3.8 pip jupyter matplotlib seaborn
conda activate chem
conda install -c conda-forge rdkit
pip install git+https://github.com/samoturk/mol2vec
wget https://raw.githubusercontent.com/tonyreina/mol2vec/master/mol2vec/features.py -O  ~/anaconda3/envs/chem/lib/python3.8/site-packages/mol2vec/features.py
wget https://github.com/samoturk/mol2vec_notebooks/blob/master/Notebooks/model_300dim.pkl
pip install -U tensorflow==2.4.1
pip install openvino-tensorflow==0.5.0
conda install scikit-learn
pip install py3Dmol

Run

Run the jupyter notebook chemistry_predict_logP_tensorflow.ipynb

Dataset

The lipophilicity dataset is available on Kaggle and released under the Public Domain (CC0). The raw data is in a CSV file with the SMILES notation of the chemical in the first column and the lipophilicity (logP) in the second column.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chemistry_predict_logP_tensorflow.ipynb		chemistry_predict_logP_tensorflow.ipynb
logP.png		logP.png
logP_dataset.csv		logP_dataset.csv
molecules.png		molecules.png
predictions_smiles.png		predictions_smiles.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chemistry with TensorFlow (and OpenVINO)

Using TensorFlow to model chemistry problems.

An example of predicting lipophilicity from the molecular formula (SMILES).

Installation

Run

Dataset

About

Releases

Packages

Languages

License

tonyreina/chemistry

Folders and files

Latest commit

History

Repository files navigation

Chemistry with TensorFlow (and OpenVINO)

Using TensorFlow to model chemistry problems.

An example of predicting lipophilicity from the molecular formula (SMILES).

Installation

Run

Dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages