Skip to content

Releases: dmlc/dgl

v0.4.3

02 Apr 03:43
Compare
Choose a tag to compare

We are thrilled to announce DGL v0.4.3, which provides many new features that enhance usability and efficiency.

Major Features and Improvements

  • TensorFlow backend is now an official feature. Add an environment variable USE_OFFICIAL_TFDLPACK for switching to the official TensorFlow DLPack support from tensorflow/tensorflow#36862.
  • New graph sampling APIs compatible with DGLHeteroGraph:
    • Redesigned dgl.random_walk with the support of metapaths.
    • A new API dgl.random_walk_with_restart for random walk with restart probability.
    • A new API dgl.sample_neighbors for sampling among neighbors with or without replacement.
    • A new API dgl.sample_neighbors_topk for picking K neighbors with the largest weight.
    • A new API dgl.in_subgraph for extracting all the neighbors connected by the in-edges.
    • A new API dgl.out_subgraph for extracting all the neighbors connected by the out-edges.
  • Accompany utilities for graph sampling:
    • A new API dgl.create_from_paths for creating a graph from sampled random walk traces.
    • A new API dgl.compact_graphs for converting a sampled subgraph to a smaller graph with no isolated nodes.
    • A new API dgl.to_block for converting a sampled subgraph to a bipartite graph suitable for computation.
    • Reworked dgl.to_simple_graph to support heterogeneous graph.
    • Reworked dgl.remove_edges to support heterogeneous graph.
  • When constructing a DGLHeteroGraph to be unidirectional bipartite — there are two node types and one edge type, where all edges are from one node type to another, the following APIs are enabled:
    • A new API DGLHeteroGraph.is_unibipartite
    • New APIs DGLHeteroGraph.num_src_nodes and DGLHeteroGraph.num_dst_nodes
    • New APIs DGLHeteroGraph.srcnodes and DGLHeteroGraph.dstnodes for getting a view of source and destination nodes, respectively.
    • New APIs DGLHeteroGraph.srcdata and DGLHeteroGraph.dstdata for getting the data of source and destination nodes, respectively.
  • NN module changes:
    • Users can now directly use dgl.nn.SomeModule instead of dgl.nn.<backend>.SomeModule.
    • Extend dgl.nn.GraphConv to support asymmetric normalizer. It now also accepts an external weight matrix instead of creating its own.
    • Extend all the NN modules to support bipartite graph inputs which enable them for sampling-based GNN training. The input node feature argument now can be a pair of tensors.
    • A new wrapper module dgl.nn.HeteroGraphConv for leveraging DGL NN modules on heterogeneous graphs.
  • Model examples using the new sampling APIs
    • Train the GraphSAGE model by neighbor sampling and scale it to multiple GPUs (link).
    • Train the Relational GCN model on heterogeneous graphs by sampling for both node classification and link prediction (link).
    • Train the PinSAGE model by random walk sampling for item recommendation (link).
    • Train the GCMC model by sampling for MovieLens rating prediction (link).
    • Implement the variance reduction technique for neighbor sampling (link) proposed by Chen et al.
  • DGL-KE
  • DGL-LifeSci
    • Spun off into a standalone package.
    • See the project page for more details.
  • A new example for scene graph extraction: https://github.com/dmlc/dgl/tree/master/examples/mxnet/scenegraph
  • A new API dgl.metis_partition for partitioning a DGLGraph by the Metis algorithm.
  • New APIs dgl.as_immutable_graph and dgl.as_heterograph for casting between DGLGraph and DGLHeteroGraph efficiently.
  • A new API dgl.rand_graph for constructing a random graph with specified number of nodes and edges.
  • A new API dgl.random.choice for more efficient non-uniform random choice.
  • Replaced DGLHeteroGraph.__setitem__ and DGLHeteroGraph.__getitem__ with a more efficient implementation.
  • dgl.data.save_graphs and dgl.data.load_graphs now support heterogeneous graphs.
  • UDFs now have the access to node types and edge types.

API Breaking Changes

  • The type of the norm argument in dgl.nn.GraphConv is changed from bool to string, with "none" indicating no normalization, "both" indicating the original symmetric normalizer proposed by GCN and "right" indicating normalizing by degrees.
  • DGLSubGraph and BatchedDGLGraph classes are removed and merged to DGLGraph. All their methods are ported to DGLGraph too, so typical usages will not be affected by this change.
  • The multigraph flag in dgl.DGLGraph is deprecated and will be removed in the future.
  • Rename the card argument in dgl.graph and dgl.bipartite to num_nodes.

Bug Fixes and Others

  • Fix a bug in remove_edges when the graph has no edge.
  • Fix a bug in creating DGLGraph from scipy coo matrix that has duplicate entries.
  • Improve the speed of sorting a COO format graph.
  • Improve the speed of dgl.to_bidirected .
  • Fix a bug in building DGL on MacOS using clang.
  • Fix a bug in NodeFlow when apply_edges is called.
  • Fix a bug in the stack cross-type reducer in DGLHeteroGraph.multi_update_all, DGLHeteroGraph.multi_pull and DGLHeteroGraph.multi_recv to make the stacking order consistent and to remove a redundant dimension.
  • Fix a bug in the loss function of the RGCN example.
  • Fix a bug in the MXNet backend when using the new Deep NumPy feature.
  • Fix a memory leak bug in the PyTorch backend when retain_graph is True.

v0.4.2

23 Jan 04:32
Compare
Choose a tag to compare

[Experimental] Tensorflow support

Tensorflow support is finally live as an experimental feature. To get started, please read the instructions about switching to tensorflow backend. Currently, we released 13 common NN modules and 4 models.

Many bug fixes and minor improvements.

v0.4.1

05 Nov 03:15
Compare
Choose a tag to compare

0.4.1 is released today!

This minor update includes:

CUDA 10.1 support (#950 )

Conda and pip users could install with

conda install -c dglteam dgl-cuda10.1
pip install dgl-cu101

MXNet NN modules support and their examples

(PR #890, @yzh119 )

  • GATConv
  • EdgeConv
  • SAGEConv
  • SGConv
  • APPNPConv
  • GINConv
  • GatedGraphConv
  • GMMConv
  • ChebConv
  • AGNNConv
  • NNConv
  • DenseGraphConv
  • DenseSAGEConv
  • DenseChebConv

Miscellaneous

v0.4.0

07 Oct 09:45
5e17ef5
Compare
Choose a tag to compare

We are thrilled to announce the 0.4 release! This release extends DGL by (i) including support for heterogeneous graphs and (ii) providing a sub-package to efficiently compute embeddings of large knowledge graphs. In addition, it includes numerous performance improvements, contributed models and datasets, and bugfixes.

Support for Heterogeneous Graphs

What is a heterogeneous graph?

Many real world data are about relations between different types of entities. For instance, an E-commerce data may have three types of entities: customers, items, and vendors. Customers and items may have different types of interactions such as clicks or purchases. Customers can also follow each other. Entities and relations may also have their own set of features.

A heterogeneous graph, whose nodes and edges are typed, could match the scenario accurately:

image

Models that work on heterogeneous graphs?

We provide a few models to demonstrate the use cases of heterogeneous graphs and the corresponding DGL APIs.

  • Graph Convolutional Matrix Completion [Code in MXNet]
    • On an EC2 p3.2xlarge instance, we obtained a 5x speedup on MovieLens-100K, and 22x on MovieLens-1M compared against official implementation. We are also able to train on the entire graph without minibatches on MovieLens-10M (the official implementation goes out of memory).
  • R-GCN [Code in PyTorch]
    • The new code can train the model for the AM dataset (>5M edges) using one GPU, while the original implementation consumes 32GB of memory, thus cannot fit on a single GPU and can only run on CPU.
    • The original implementation takes 51.88s to train one epoch on CPU. The new R-GCN based on heterograph takes only 0.1781s for one epoch on V100 GPU (291x faster !!).
  • Heterogeneous Attention Networks [Code in PyTorch]
    • We provide the dgl.transform.metapath_reachable_graph that transform a heterogeneous graph into a new graph, where two nodes are connected if the source node can reach the destination node via the given metapath.
  • Metapath2vec [Code in PyTorch]
    • We implement the metapath sampler in C++, making it twice as fast as the original implementation.

Checkout our heterograph tutorial: Working with Heterogeneous Graphs in DGL

Checkout the full API reference.

DGL-KE : A DGL-based Sub-package for Computing Embeddings of Large Knowledge Graphs

Knowledge graph (KG) embedding is to embed entities and relations of a KG into continuous vector spaces. The embeddings preserve the inherent structure of the KG and can benefit many downstream tasks such as KG completion and relation extraction as well as recommendations.

We release DGL-KE that computes embeddings of large KGs efficiently. The package is adapted from the KnowledgeGraphEmbedding package. We extend KnowledgeGraphEmbedding by leveraging DGL's core to achieve high efficiency and scalability. Using a single NVIDIA V100 GPU, DGL-KE can train TransE on FB15k in 6.85 mins, substantially outperforming existing tools such as GraphVite. For graphs with hundreds of millions of edges (such as the full Freebase graph), it takes a couple of hours on one large EC2 CPU instance such as m5.24xlarge and x1.32xlarge.

Currently, the following models are supported:

  • TransE
  • DistMult
  • ComplEx

More models (RESCAL, RotatE, pRotatE, TransH, TransR, TransD, etc) are under development and will be released in the future.

DGL-KE supports various training methods:

  • CPU training: Graph Embeddings are stored in CPU memory and mini-batches are trained on CPU.

  • GPU training: Graph Embeddings are stored in GPU memory and mini-batches are trained on GPU.

  • Joint CPU & GPU training: Graph Embeddings are stored in CPU memory but mini-batches are trained on GPU. This is designed for training KGE models on large knowledge graphs that cannot fit in GPU.

  • Multiprocessing training on CPUs: Each CPU process train mini-batches independently and use shared memory for communication between processes. This is designed to train KGE models on large knowledge graphs with many CPU cores.

Multi-GPU training and distributed training will be released in the future.

For more information, please refer to this directory

Miscellaneous

v0.3.1

28 Aug 05:46
Compare
Choose a tag to compare

We have received many requests from our community for more GNN layers, models and examples. This is the time to respond. In this minor release, we enriched DGL with a ton of common GNN modules. We have also verified their correctness on some popular datasets so feel free to try them out. Another direction we are working on is to build more domain friendly packages based on DGL. As a first step, we released several pretrained GNN models for molecular property prediction and molecule generation (currently grouped under dgl.model_zoo namespace). We will continue explore this idea and release more domain specific models and packages.

New APIs

New NN Modules

New global pooling module

Please refer to the API document for more details.

New graph transformation routines

  • dgl.transform.khop_adj
  • dgl.transform.khop_graph
  • dgl.transform.laplacian_lambda_max
  • dgl.transform.knn_graph
  • dgl.transform.segmented_knn_graph

Please refer to the API document for more details.

Model zoo for chemistry and molecule applications

To make it easy for domain scientists, we are now releasing a model zoo for chemistry, with training scripts and pre-trained models, and focuses on two particular tasks: property prediction and targeted molecular generation/optimization.

Credit: Shout out to @geekinglcq from Tencent Quantum Lab for contributing three models (MGCN, SchNet and MPNN). We also thank WuXi AppTec CADD team for their critical feedback on usability.

Property prediction

In practice, the determination of molecular properties is mostly achieved via wet lab experiments. We can cast the problem as a regression or classification problem.

Featurization is the beginning of prediction. Traditionally, chemists develop pre-defined rules to convert molecular graphs into binary strings where each bit indicates the presence or absence of a particular substructure.

Graph neural networks enable a data-driven representation of molecules out of the atoms, bonds and molecular graph topology, which may be viewed as a learned fingerprint. The message passing mechanism allows the model to learn the interactions between atoms in a molecule.

The following code script is self-explanatory.

from dgl.data import Tox21
from dgl import model_zoo

dataset = Tox21()
model = model_zoo.chem.load_pretrained('GCN_Tox21') # Pretrained model loaded
model.eval()

smiles, g, label, mask = dataset[0]
feats = g.ndata.pop('h')
label_pred = model(g, feats)
print(smiles)                   # CCOc1ccc2nc(S(N)(=O)=O)sc2c1
print(label_pred[:, mask != 0]) # Mask non-existing labels
# tensor([[-0.7956,  0.4054,  0.4288, -0.5565, -0.0911,  
# 0.9981, -0.1663,  0.2311, -0.2376,  0.9196]])

Supported Models

  • Graph Convolution
  • Graph Attention Networks
  • SchNet
  • Multilevel Graph Convolutional neural Network
  • Message Passing Neural Networks

Generative Models

Targeted molecular generation refers to finding new molecules with desired properties. This gives rise to the need for generative models for two purposes:

  • Distribution Learning: Given a collection of molecules, we want to model their distribution and generate new molecules consistent with the distribution.
  • Goal-directed Optimization: Find molecules with desired properties.

For this model zoo, we provide only graph-based generative models. There are other generative models working with alternative representations like SMILES.

Example with Pre-trained Models

# We recommend running the code below with Jupyter notebooks
from IPython.display import SVG
from rdkit import Chem
from rdkit.Chem import Draw

from dgl import model_zoo

model = model_zoo.chem.load_pretrained('DGMG_ZINC_canonical')
model.eval()
mols = []
for i in range(4):
    SMILES = model(rdkit_mol=True)
    mols.append(Chem.MolFromSmiles(SMILES))
# Generating 4 molecules takes less than a second.

SVG(Draw.MolsToGridImage(mols, molsPerRow=4, subImgSize=(180, 150), useSVG=True))

Supported Models

  • Learning Deep Generative Models of Graphs
  • Junction Tree Variational Autoencoder for Molecular Graph Generation

API break

We refactor the nn package to make all APIs more consistent. Thus, there are following changes to the API that breaks the previous behavior:

  • Change the argument order of dgl.nn.pytorch.GraphConv and dgl.nn.mxnet.GraphConv. The argument order is now first graph and then feat, which follows the convention of all the other new modules.

New model example

Recurrent Relational Networks in PyTorch (credit: @HuXiangkun )

There are also many bug fixes and minor changes. We will list them in the next 0.4 major release.

0.3

13 Jun 09:46
be58224
Compare
Choose a tag to compare
0.3

DGL v0.3 Release Note

V0.3 release includes many crucial updates:

  • Fused message passing kernels that greatly boost the training of GNNs on large graphs. Please refer to our blogpost for more details.
  • Demostration of how to train GNNs on giant graphs by graph sampling.
  • New models and NN modules.
  • Many other bugfixes and other enhancement.

As a result, please be aware of the following changes:

Installation

Previous installation methods with pip and conda, i.e.:

pip install dgl
conda install -c dglteam dgl

now only install CPU builds (works for Linux/MacOS/Windows).

July 2nd update

We found that the Windows build of DGL v0.3 on PyPI is currently inconsistent with the 0.3.x branch. Windows pip users, please install it with:

pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/dgl-0.3-cp35-cp35m-win_amd64.whl
pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/dgl-0.3-cp36-cp36m-win_amd64.whl
pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/dgl-0.3-cp37-cp37m-win_amd64.whl

Installing CUDA builds with pip

Pip users could install the DGL CUDA builds with the following:

pip install <package-url>

where <package-url> is one of the following:

CUDA 9.0 CUDA 10.0
Linux + Py35 pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda9.0/dgl-0.3-cp35-cp35m-manylinux1_x86_64.whl pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda10.0/dgl-0.3-cp35-cp35m-manylinux1_x86_64.whl
Linux + Py36 pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda9.0/dgl-0.3-cp36-cp36m-manylinux1_x86_64.whl pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda10.0/dgl-0.3-cp36-cp36m-manylinux1_x86_64.whl
Linux + Py37 pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda9.0/dgl-0.3-cp37-cp37m-manylinux1_x86_64.whl pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda10.0/dgl-0.3-cp37-cp37m-manylinux1_x86_64.whl
Win + Py35 pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda9.0/dgl-0.3-cp35-cp35m-win_amd64.whl pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda10.0/dgl-0.3-cp35-cp35m-win_amd64.whl
Win + Py36 pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda9.0/dgl-0.3-cp36-cp36m-win_amd64.whl pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda10.0/dgl-0.3-cp36-cp36m-win_amd64.whl
Win + Py37 pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda9.0/dgl-0.3-cp37-cp37m-win_amd64.whl pip install https://s3.us-east-2.amazonaws.com/dgl.ai/wheels/cuda10.0/dgl-0.3-cp37-cp37m-win_amd64.whl
MacOS N/A N/A

Installing CUDA builds with conda

Conda users could install the CUDA builds with

conda install -c dglteam dgl-cuda9.0   # For CUDA 9.0
conda install -c dglteam dgl-cuda10.0  # For CUDA 10.0

DGL currently support CUDA 9.0 (dgl-cuda9.0) and 10.0 (dgl-cuda10.0). To find your CUDA version, use nvcc --version. To install from source, checkout our installation guide.

New built-in message and reduce functions

We have expanded the list of built-in message and reduce functions to cover more use cases. Previously, DGL only has copy_src, copy_edge, src_mul_edge. With the v0.3 release, we support more combinations. Here is a demonstration of some of the new builtin functions.

import dgl
import dgl.function as fn
import torch as th
g = ... # create a DGLGraph
g.ndata['h'] = th.randn((g.number_of_nodes(), 10)) # each node has feature size 10
g.edata['w'] = th.randn((g.number_of_edges(), 1))  # each edge has feature size 1
# collect features from source nodes and aggregate them in destination nodes
g.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h_sum'))
# multiply source node features with edge weights and aggregate them in destination nodes
g.update_all(fn.u_mul_e('h', 'w', 'm'), fn.max('m', 'h_max'))
# compute edge embedding by multiplying source and destination node embeddings
g.apply_edges(fn.u_mul_v('h', 'h', 'w_new'))

As you can see, the syntax is quite straight-forward. u_mul_e means multiplying the source node data with the edge data; u_mul_v means multiplying the source node data with the destination node data, and so on and so forth. Each builtin combination will be mapped to a CPU/CUDA kernel and broadcasting and gradient computation are also supported. Checkout our document for more details.

Tutorials for training on giant graphs

Two new tutorials are now live:

  • Train GNNs by neighbor sampling and its variants (link).
  • Scale the sampler-trainer architecture to giant graphs using distributed graph store (link).

We also provide scripts on how to setup such distributed setting (link).

Enhancement and bugfix

  • NN modules
    • dgl.nn.[mxnet|pytorch].edge_softmax now directly returns the normalized scores on edges.
    • Fix a memory leak bug when graph is passed as the input.
  • Graph
    • DGLGraph now supports direct conversion from scipy csr matrix rather than conversion to coo matrix first.
    • Readonly graph can now be batched via dgl.batch.
    • DGLGraph now supports node/edge removal via DGLGraph.remove_nodes and DGLGraph.remove_edges (doc).
    • A new API DGLGraph.to(device) that can move all node/edge data to the given device.
    • A new API dgl.to_simple that can convert a graph to a simple graph with no multi-edges.
    • A new API dgl.to_bidirected that can convert a graph to a bidirectional graph.
    • A new API dgl.contrib.sampling.random_walk that can generate random walks from a graph.
    • Allow DGLGraph to be constructed from another DGLGraph.
  • New model examples
    • APPNP
    • GIN
    • PinSage (slow version)
    • DGI
  • Bugfix
    • Fix a bug where numpy integer is passed in as the argument.
    • Fix a bug when constructing from a networkx graph that has no edge.
    • Fix a bug in nodeflow where id is not correctly converted sometimes.
    • Fix a bug in MiniGC dataset where the number of nodes is not consistent.
    • Fix a bug in RGCN example when bfs_level=0.
    • Fix a bug where DLContext is not correctly exposed in CFFI.
    • Fix a crash during Cython build.
    • Fix a bug in send when the given message function is a builtin.

0.2

08 Mar 23:01
Compare
Choose a tag to compare
0.2

Major release that includes many features, bugfix and performance improvement. Speed of GCN model on Pubmed dataset has been improved by 4.32x! Speed of RGCN model on Mutag dataset has been improved by 3.59x! Important new feature: graph sampling APIs.

Update details:

Model examples

Core system improvement

Tutorial/Blog

Project improvement

0.1.3

11 Dec 22:30
af23c45
Compare
Choose a tag to compare

Patch release mainly for the Pytorch v1.0 update.

  • Bug fix to be compatible with Pytorch v1.0.
  • Bug fix in networkx graph conversion.

First open release

07 Dec 07:50
9f32554
Compare
Choose a tag to compare

The first open release includes basically everything in the repository.

  • Basic DGL APIs and systems.
  • Backend support for Pytorch and MXNet.
  • 10 GNN model examples and tutorials.