HydraGNN

Distributed PyTorch implementation of multi-headed graph convolutional neural networks

Capabilities

Multi-headed Prediction for graph and node-level properties
Distributed Data Parallelism at supercomputing level
Convolutional Layers as a hyperparameter
Geometric Equivariance in convolution and prediction
Global Attention

Dependencies

To install required packages with only basic capability (torch, torch_geometric, and related packages) and to serialize+store the processed data for later sessions (pickle5):

pip install -r requirements.txt
pip install -r requirements-torch.txt
pip install -r requirements-pyg.txt

If you plan to modify the code, include packages for formatting (black) and testing (pytest) the code:

pip install -r requirements-dev.txt

Detailed dependency installation instructions are available on the Wiki

Installation

After checking out HydgraGNN, we recommend to install HydraGNN in a developer mode so that you can use the files in your current location and update them if needed:

python -m pip install -e .

Or, simply type the following in the HydraGNN directory:

export PYTHONPATH=$PWD:$PYTHONPATH

Alternatively, if you have no plane to update, you can install HydraGNN in your python tree as a static package:

python setup.py install

Running the code

Below are the four main functionalities for running the code.

Training a model, including continuing from a previously trained model using configuration options:

import hydragnn
hydragnn.run_training("examples/configuration.json")

Saving a model state:

import hydragnn
model_name = model_checkpoint.pk
hydragnn.save_model(model, optimizer, model_name, path="./logs/")

Loading a model state:

import hydragnn
model_name = model_checkpoint.pk
hydragnn.load_existing_model(model, model_name, path="./logs/")

Making predictions from a previously trained model:

import hydragnn
hydragnn.run_prediction("examples/configuration.json", model)

The run_training and run_predictions functions are convenient routines that encapsulate all the steps of the training process (data generation, data pre-processing, training of HydraGNN models, and use of trained HydraGNN models for inference) on toy problems, which are included in the CI test workflows. Both run_training and run_predictions require a JSON input file for configurable options. The save_model and load_model functions store and retrieve model checkpoints for continued training and subsequent inference. Ad-hoc example scripts where data pre-processing, training, and inference are done for specific datasets are provided in the examples folder.

Datasets

Built in examples are provided for testing purposes only. One source of data to create HydraGNN surrogate predictions is DFT output on the OLCF Constellation: https://doi.ccs.ornl.gov/

Detailed instructions are available on the Wiki

Configurable settings

HydraGNN uses a JSON configuration file (examples in examples/):

There are many options for HydraGNN; the dataset and model type are particularly important:

["Verbosity"]["level"]: 0, 1, 2, 3, 4 (int)
["Dataset"]["name"]: CuAu_32atoms, FePt_32atoms, FeSi_1024atoms (str)

Additionally, many important arguments fall within the ["NeuralNetwork"] section:

["NeuralNetwork"]
- ["Architecture"]
  - ["mpnn_type"]
    Accepted types: CGCNN, DimeNet, EGNN, GAT, GIN, MACE, MFC, PAINN, PNAEq, PNAPlus, PNA, SAGE, SchNet (str)
  - ["num_conv_layers"]
    Examples: 1, 2, 3, 4 ... (int)
  - ["output_heads"]
    Task types: node, graph (int)
  - ["global_attn_engine"] Accepted types: GPS, None
  - ["global_attn_type"] Accepted types: multihead
  - ["pe_dim"] Dimension of positional encodings (int)
  - ["global_attn_heads"] Examples: 1, 2, 3, 4 ... (int)
  - ["hidden_dim"]
    Dimension of node embeddings during convolution (int) - must be a multiple of "global_attn_heads" if "global_attn_engine" is not "None"
- ["Variables of Interest"]
  - ["input_node_features"]
    Indices from nodal data used as inputs (int)
  - ["output_index"]
    Indices from data used as targets (int)
  - ["type"]
    Either node or graph (string)
  - ["output_dim"]
    Dimensions of prediction tasks (list)
- ["Training"]
  - ["num_epoch"]
    Examples: 75, 100, 250 (int)
  - ["batch_size"]
    Examples: 16, 32, 64 (int)
  - ["Optimizer"]["learning_rate"]
    Examples: 2e-3, 0.005 (float)
  - ["compute_grad_energy"]
    Use the gradient of energy to predict forces (bool)

Citations

"HydraGNN: Distributed PyTorch implementation of multi-headed graph convolutional neural networks", Copyright ID#: 81929619 https://doi.org/10.11578/dc.20211019.2

Contributing

We encourage you to contribute to HydraGNN! Please check the guidelines on how to do so.

Name		Name	Last commit message	Last commit date
Latest commit History 718 Commits
.github/workflows		.github/workflows
examples		examples
hydragnn		hydragnn
images		images
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pytest.ini		pytest.ini
requirements-deepspeed.txt		requirements-deepspeed.txt
requirements-dev.txt		requirements-dev.txt
requirements-optional.txt		requirements-optional.txt
requirements-pyg.txt		requirements-pyg.txt
requirements-torch.txt		requirements-torch.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HydraGNN

Capabilities

Dependencies

Installation

Running the code

Datasets

Configurable settings

Citations

Contributing

About

Releases 2

Packages

Contributors 14

Languages

License

ORNL/HydraGNN

Folders and files

Latest commit

History

Repository files navigation

HydraGNN

Capabilities

Dependencies

Installation

Running the code

Datasets

Configurable settings

Citations

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 14

Languages

Packages