Void generation with SPARQL updates

I have never succeeded running the java void-generator on my two main GraphDB repositories, primarily because they contain too many properties and contexts (AKA named graphs). The generator is issuing too many queries _e.g._ at least half a million on one of the repository, and the generator is unable to finish overnight, which precludes its routine usage.

I am really willing to use the [SIB SPARQL editor](https://github.com/sib-swiss/sparql-editor). It retrieves some info from the void description with the following query:

```sparql
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>

SELECT DISTINCT ?subjectClass ?prop ?objectClass ?objectDatatype
WHERE {
    {
        ?cp void:class ?subjectClass ;
            void:propertyPartition ?pp .
        ?pp void:property ?prop .
        OPTIONAL {
            {
                ?pp  void:classPartition [ void:class ?objectClass ] .
            } UNION {
                ?pp void-ext:datatypePartition [ void-ext:datatype ?objectDatatype ] .
            }
        }
    } UNION {
        ?linkset void:subjectsTarget ?subjectClass ;
            void:linkPredicate ?prop ;
            void:objectsTarget ?objectClass .
    }
}
```

I have create an _ad hoc_ void description with the following SPARQL update statements:

```sparql
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>

DROP GRAPH <http://example.org/.well-known/void>
;
INSERT{
    GRAPH <http://example.org/.well-known/void> {
        ?cp void:class    ?class ;
            void:entities ?count .
    }
}
WHERE{
    {
        SELECT ?class ( COUNT( * ) AS ?count ) 
        WHERE{ 
            [] a ?class 
        }
        GROUP BY ?class
    }
    BIND( IRI( CONCAT( "urn:class_partition_", MD5( STR( ?class )))) AS ?cp )
}
;
INSERT{
    GRAPH <http://example.org/.well-known/void> {
        ?cps void:propertyPartition ?pp .
        ?pp void:property       ?prop  ;
            void:triples        ?count ;  
            void:classPartition ?cpo   .
    }
}
WHERE{
     { 
        SELECT ?c_s ?prop ?c_o ( COUNT( * ) AS ?count )
        WHERE{ 
            ?s ?prop ?o .
            ?s a ?c_s .
            ?o a ?c_o .
     	} 
     	GROUP BY ?prop ?c_s ?c_o
    }
    BIND( IRI( CONCAT( "urn:class_partition_", MD5( STR( ?c_s )))) AS ?cps )
    BIND( IRI( CONCAT( "urn:class_partition_", MD5( STR( ?c_o )))) AS ?cpo )
    BIND( IRI( CONCAT( "urn:property_partition_", MD5( STR( ?prop )))) AS ?pp )
}
;
INSERT{
    GRAPH <http://example.org/.well-known/void> {
        ?cps void:propertyPartition ?pp .
        ?pp void:property       ?prop  ;
            void:triples        ?count ;  
            void-ext:datatypePartition ?dtp .
        ?dtp void-ext:datatype ?dt .
    }
}
WHERE{
     { 
        SELECT ?c_s ?prop ?dt ( COUNT( * ) AS ?count )
        WHERE{ 
            ?s ?prop ?o .
            ?s a ?c_s .
            FILTER( isLITERAL( ?o ))
            BIND( DATATYPE( ?o ) AS ?dt )
     	} 
     	GROUP BY ?prop ?c_s ?dt
    }
    BIND( IRI( CONCAT( "urn:class_partition_", MD5( STR( ?c_s )))) AS ?cps )
    BIND( IRI( CONCAT( "urn:datatype_partition_", MD5( STR( ?dt )))) AS ?dtp )
    BIND( IRI( CONCAT( "urn:property_partition_", MD5( STR( ?prop )))) AS ?pp )
}
```

It takes 2 min to complete on a repository with 77 million triples, 14 contexts and 296 properties. 
It takes about an hour to complete on a repository with 325 million triples, 21 contexts and 95 properties... on another server... that I should fine tune.

This solution behave as expected with the above SPARQL query and the SIB SPARQL editor.

_Nota Bene_: 
	
* The IRI of the destination context should be updated as required.
 
* I am not interested in considering individual contexts at the time of querying the repository 
	
* Counts are certainly incomplete, and the void schema remains a little bit cryptic to me.

Hope this helps,

  Marco

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Void generation with SPARQL updates #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Void generation with SPARQL updates #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions