I have never succeeded running the java void-generator on my two main GraphDB repositories, primarily because they contain too many properties and contexts (AKA named graphs). The generator is issuing too many queries e.g. at least half a million on one of the repository, and the generator is unable to finish overnight, which precludes its routine usage.
I am really willing to use the SIB SPARQL editor. It retrieves some info from the void description with the following query:
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>
SELECT DISTINCT ?subjectClass ?prop ?objectClass ?objectDatatype
WHERE {
{
?cp void:class ?subjectClass ;
void:propertyPartition ?pp .
?pp void:property ?prop .
OPTIONAL {
{
?pp void:classPartition [ void:class ?objectClass ] .
} UNION {
?pp void-ext:datatypePartition [ void-ext:datatype ?objectDatatype ] .
}
}
} UNION {
?linkset void:subjectsTarget ?subjectClass ;
void:linkPredicate ?prop ;
void:objectsTarget ?objectClass .
}
}
I have create an ad hoc void description with the following SPARQL update statements:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX void-ext: <http://ldf.fi/void-ext#>
DROP GRAPH <http://example.org/.well-known/void>
;
INSERT{
GRAPH <http://example.org/.well-known/void> {
?cp void:class ?class ;
void:entities ?count .
}
}
WHERE{
{
SELECT ?class ( COUNT( * ) AS ?count )
WHERE{
[] a ?class
}
GROUP BY ?class
}
BIND( IRI( CONCAT( "urn:class_partition_", MD5( STR( ?class )))) AS ?cp )
}
;
INSERT{
GRAPH <http://example.org/.well-known/void> {
?cps void:propertyPartition ?pp .
?pp void:property ?prop ;
void:triples ?count ;
void:classPartition ?cpo .
}
}
WHERE{
{
SELECT ?c_s ?prop ?c_o ( COUNT( * ) AS ?count )
WHERE{
?s ?prop ?o .
?s a ?c_s .
?o a ?c_o .
}
GROUP BY ?prop ?c_s ?c_o
}
BIND( IRI( CONCAT( "urn:class_partition_", MD5( STR( ?c_s )))) AS ?cps )
BIND( IRI( CONCAT( "urn:class_partition_", MD5( STR( ?c_o )))) AS ?cpo )
BIND( IRI( CONCAT( "urn:property_partition_", MD5( STR( ?prop )))) AS ?pp )
}
;
INSERT{
GRAPH <http://example.org/.well-known/void> {
?cps void:propertyPartition ?pp .
?pp void:property ?prop ;
void:triples ?count ;
void-ext:datatypePartition ?dtp .
?dtp void-ext:datatype ?dt .
}
}
WHERE{
{
SELECT ?c_s ?prop ?dt ( COUNT( * ) AS ?count )
WHERE{
?s ?prop ?o .
?s a ?c_s .
FILTER( isLITERAL( ?o ))
BIND( DATATYPE( ?o ) AS ?dt )
}
GROUP BY ?prop ?c_s ?dt
}
BIND( IRI( CONCAT( "urn:class_partition_", MD5( STR( ?c_s )))) AS ?cps )
BIND( IRI( CONCAT( "urn:datatype_partition_", MD5( STR( ?dt )))) AS ?dtp )
BIND( IRI( CONCAT( "urn:property_partition_", MD5( STR( ?prop )))) AS ?pp )
}
It takes 2 min to complete on a repository with 77 million triples, 14 contexts and 296 properties.
It takes about an hour to complete on a repository with 325 million triples, 21 contexts and 95 properties... on another server... that I should fine tune.
This solution behave as expected with the above SPARQL query and the SIB SPARQL editor.
Nota Bene:
-
The IRI of the destination context should be updated as required.
-
I am not interested in considering individual contexts at the time of querying the repository
-
Counts are certainly incomplete, and the void schema remains a little bit cryptic to me.
Hope this helps,
Marco
I have never succeeded running the java void-generator on my two main GraphDB repositories, primarily because they contain too many properties and contexts (AKA named graphs). The generator is issuing too many queries e.g. at least half a million on one of the repository, and the generator is unable to finish overnight, which precludes its routine usage.
I am really willing to use the SIB SPARQL editor. It retrieves some info from the void description with the following query:
I have create an ad hoc void description with the following SPARQL update statements:
It takes 2 min to complete on a repository with 77 million triples, 14 contexts and 296 properties.
It takes about an hour to complete on a repository with 325 million triples, 21 contexts and 95 properties... on another server... that I should fine tune.
This solution behave as expected with the above SPARQL query and the SIB SPARQL editor.
Nota Bene:
The IRI of the destination context should be updated as required.
I am not interested in considering individual contexts at the time of querying the repository
Counts are certainly incomplete, and the void schema remains a little bit cryptic to me.
Hope this helps,
Marco