GraB

Abstract

We introduce GraB, a new class of datasets for community detection that exposes unique characteristics. As opposed to available datasets, our graphs are at the same time heterogeneous, i.e., include different types of nodes, comprise overlapping communities, i.e., nodes belong to multiple communities, and include attributes in the nodes. We show that state-of-the-art methods struggle in finding communities in such graphs, suggesting a gap in the field.
The GraB datasets are real-world heterogeneuous graph sourced from the movie production domain, containing vertices of type movie, actor, writer, producer, editor, and director. The three variations (full/min/top 3) are different in the way that labels(genres) are assigned to person vertices. Details can be found in https://openreview.net/pdf?id=qsGpFRlK0a.

Data format

The data is saved as sparce matricies in npz files, and can be loaded as follows:

import numpy as np
import scipy.sparse as sp

data = np.load(pathToNpz)
loader = dict(data)
#adjacency matrix
adjacency = sp.csr_matrix((loader['adj_matrix.data'], loader['adj_matrix.indices'], loader['adj_matrix.indptr']), shape=loader['adj_matrix.shape'])

#attributes matrix
attributes = sp.csr_matrix((loader['attr_matrix.data'], loader['attr_matrix.indices'], loader['attr_matrix.indptr']), shape=loader['attr_matrix.shape'])

#label matrix
labels = sp.csr_matrix((loader['labels.data'], loader['labels.indices'], loader['labels.indptr']), shape=loader['labels.shape'])

Citing

If you find GraB benchmask useful in your research, we ask that you cite the following paper:

@inproceedings{GraBBenchmark,
     author = {Knudsen, Malik S. and Brodal, Laurits A. and Peczalski, Peter K. and Moradan, Atefeh and Mottin, Davide and Assent, Ira},
     title = {GraB: Graph Benchmark for Heterogeneous Graph Clustering},
     abstract = {We introduce GraB, a benchmark for graph clustering that exposes unique characteristics. As opposed to available datasets, our graphs are at the same time heterogeneous, i.e., include different types of nodes and node attributes, and comprise overlapping clusters, i.e., each node belongs to multiple clusters. We empirically show the arduous characteristics of the datasets; the GraB datasets are available at https://anonymous.4open.science/r/GraB-benchmarks/.},
     year = {2022},
    }

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
Datasets		Datasets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraB

Abstract

Data format

Citing

About

Releases

Packages

Contributors 3

AU-DIS/GraB

Folders and files

Latest commit

History

Repository files navigation

GraB

Abstract

Data format

Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages