Chemical Hierarchy Approximation for secondary Metabolism clusters Obtained In Silico.
CHAMOIS is a fast method for predicting chemical features of natural products produced by Biosynthetic Gene Clusters (BGCs) using only their genomic sequence. It can be used to get chemical features from BGCs predicted in silico with tools such as GECCO or antiSMASH.
This section shows only the basic commands for installing and running CHAMOIS. The online documentation contains a more detailed installation guide, examples, an API reference, and a CLI reference
CHAMOIS is implemented in Python, and supports all versions from Python 3.7 onwards. It requires additional libraries that can be installed directly from PyPI, the Python Package Index.
$ pip install chamois-tool
Installing the package is instantaneous, but requires downloading an extra 44 MiB of data (profile HMMs) from GitHub, which will add to the install time depending on the speed of your Internet connection.
Note that CHAMOIS uses HMMER3, which can only run on PowerPC, x86-64 and Aarch64 machines running a POSIX operating system. Therefore, CHAMOIS will work on Linux and OSX, but not on Windows.
Once CHAMOIS is installed, you can run it from the terminal by providing it with one or more GenBank file the genomic records of the BGCs to analyze, and an output path where to write the results in HDF5 format. For instance to predict the classes for BGC0000703, a kanamycin-producing BGC from MIBiG:
$ chamois predict -i tests/data/BGC0000703.4.gbk -o tests/data/BGC0000703.4.hdf5
This takes about 3 seconds and 600 MiB of RAM on a higher-end laptop (Linux 6.13.8, i7-1255U @ 4.70 GHz). The runtime and memory usage scales linearly with the number of BGCs to process.
Additional examples for running CHAMOIS can be found in the online documentation.
The output file can be loaded with the anndata
package, and corresponds
to a probability matrix where rows are the input BGCs, and columns are the
ChemOnt classes.
To get a summary for each predicted BGC, use the render
command:
$ chamois render -i tests/data/BGC0000703.4.hdf5
Predictions for each BGC will be shown as a tree with their computed probabilities:
CHEMONTID:0000002 (Organoheterocyclic compounds): 0.996
βββ CHEMONTID:0002012 (Oxanes): 0.996β
βββ CHEMONTID:0004140 (Oxacyclic compounds): 0.976
CHEMONTID:0004150 (Hydrocarbon derivatives): 0.999
CHEMONTID:0004557 (Organopnictogen compounds): 0.948
CHEMONTID:0004603 (Organic oxygen compounds): 1.000
βββ CHEMONTID:0000323 (Organooxygen compounds): 1.000
βββ CHEMONTID:0000011 (Carbohydrates and carbohydrate conjugates): 0.996
β βββ CHEMONTID:0001540 (Monosaccharides): 0.996
β βββ CHEMONTID:0002105 (Glycosyl compounds): 0.977
β β βββ CHEMONTID:0002207 (O-glycosyl compounds): 0.977
β βββ CHEMONTID:0003305 (Aminosaccharides): 0.995
β βββ CHEMONTID:0000282 (Aminoglycosides): 0.995
β βββ CHEMONTID:0001675 (Aminocyclitol glycosides): 0.995
β βββ CHEMONTID:0003575 (2-deoxystreptamine aminoglycosides): 0.961
βββ CHEMONTID:0000129 (Alcohols and polyols): 1.000
β βββ CHEMONTID:0000286 (Primary alcohols): 0.891
β βββ CHEMONTID:0001292 (Cyclic alcohols and derivatives): 0.998
β β βββ CHEMONTID:0002509 (Cyclitols and derivatives): 0.996
β β βββ CHEMONTID:0002510 (Aminocyclitols and derivatives): 0.987
β βββ CHEMONTID:0001661 (Secondary alcohols): 0.999
β β βββ CHEMONTID:0002647 (Cyclohexanols): 0.995
β βββ CHEMONTID:0002286 (Polyols): 0.972
βββ CHEMONTID:0000254 (Ethers): 0.959
βββ CHEMONTID:0001656 (Acetals): 0.959
CHEMONTID:0004707 (Organic nitrogen compounds): 0.999
βββ CHEMONTID:0000278 (Organonitrogen compounds): 0.999
βββ CHEMONTID:0002449 (Amines): 0.999
β βββ CHEMONTID:0002450 (Primary amines): 0.989
β β βββ CHEMONTID:0000469 (Monoalkylamines): 0.989
β βββ CHEMONTID:0002460 (Alkanolamines): 0.999
β βββ CHEMONTID:0001897 (1,2-aminoalcohols): 0.992
βββ CHEMONTID:0002674 (Cyclohexylamines): 0.987
Training CHAMOIS is also done with the CLI, provided you have training data available. You can use the CHAMOIS datasets released on Zenodo to reproduce our results.
For instance, to train on the MIBiG 3.1 BGCs, the dataset used to train the CHAMOIS classifier distributed with the code, run the following command:
$ chamois train -f data/datasets/mibig3.1/features.hdf5 -c data/datasets/mibig3.1/classes.hdf5 -o model.json
This takes about 12 seconds and 600 MiB of RAM on a higher-end laptop (Linux 6.13.8, i7-1255U @ 4.70 GHz).
CHAMOIS is a pure-python package but requires HMMER, which only runs on PowerPC, x86-64 and Aarch64 systems, and only on POSIX operating systems (Linux, MacOS, BSD). Windows is not supported by HMMER.
CHAMOIS is tested on Linux (Ubuntu 22.04) using the GitHub Actions continuous integration platform.
CHAMOIS supports (and is tested) on all Python versions from Python 3.7 onwards. It requires the following Python packages:
CHAMOIS can be cited using the following preprint:
Machine learning inference of natural product chemistry across biosynthetic gene cluster types. Martin Larralde, Georg Zeller. bioRxiv 2025.03.13.642868; doi:10.1101/2025.03.13.642868
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
Contributions are more than welcome! See CONTRIBUTING.md
for more details.
This software is provided under the GNU General Public License v3.0 or later. CHAMOIS is developped by the Zeller Lab at the European Molecular Biology Laboratory in Heidelberg and the Leiden University Medical Center in Leiden.