A scalable Flask application for predicting molecular properties from SMILES strings using Random Forest regression with robust SMILES validation and molecular structure visualization.
-
Robust SMILES Validation: Multi-level validation using RDKit built-in functions
- Parse validation
- Atom count check
- Valence validation
- Radical electron detection
-
Molecular Structure Visualization: 2D structure rendering with heavy atoms only (hydrogens removed)
-
Property Prediction: Random Forest models for:
- Molecular Weight
- LogP (lipophilicity)
- TPSA (Topological Polar Surface Area)
molecular_predictor/
├── app.py # Main Flask application
├── config.py # Configuration settings
├── models/
│ └── predictor.py # ML model implementation
├── utils/
│ └── validators.py # SMILES validation & image generation
├── static/
│ ├── css/
│ │ └── style.css # Stylesheets
│ └── js/
│ └── main.js # Frontend JavaScript
├── templates/
│ └── index.html # HTML template
├── requirements.txt # Python dependencies
└── README.md # This file
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtRun the application:
python app.pyVisit http://localhost:5000 in your browser.
The app performs comprehensive validation:
- Format Check: Ensures SMILES is a non-empty string
- Parsing: Validates molecular structure can be parsed
- Sanitization: Checks for valid valence states
- Quality Checks: Detects radicals and unusual structures
GET /- Main web interfacePOST /predict- Predict molecular properties- Request:
{"smiles": "CCO"} - Response:
{"smiles": "CCO", "canonical_smiles": "CCO", "predicted_properties": {...}, "molecule_image": "base64..."}
- Request:
GET /health- Health check endpoint
Valid:
CCO- ethanolc1ccccc1- benzeneCC(=O)O- acetic acidCC(C)(C)C- neopentane
Invalid (will show specific error):
XYZ- Invalid charactersC[- Incomplete structureC=C=C=C- Valence issues
Edit config.py to modify:
- Model parameters (n_estimators, fingerprint_bits)
- Server settings (host, port, debug mode)
- Environment-specific configurations
- Add model training endpoint
- Implement batch predictions
- Add user authentication
- Deploy with Docker
- Add database for storing predictions
- Support for 3D structure visualization