Skip to content

Conversation

@hannahbast
Copy link
Collaborator

So far, triples from rdf_unlinked_shared_data were deleted along with the triples from rdf_deleted_data. This is a mistake because the triples from rdf_unlinked_shared_data might still be relevant for other entities. We now keep all of them, even if they may end up as orphaned triples at some point. The documentation of the Wikidata update stream explicitly allows (and even encourages) this behavior. Fixes ad-freiburg/qlever#2670

On the side, no longer insert wikibase:Dump schema:dateModified DATE triples. They were confusing because the semantics of the triples in the dump (produced by the Wikidata dump process) was that the minimum had to be taken to get a date until when all updates were considered, whereas for the triples that used to be inserted during the live update (by the qlever update-wikidata command) the semantics was to take the maximum. Instead, there is now only the wikibase:Dump wikibase:updatesCompleteUntil DATE and wikibase:Dump wikibase:updateStreamNextOffset OFFSET triples, which have clear semantics (captured perfectly in the predicate names)

So far, triples from `rdf_unlinked_shared_data` were deleted along with
the triples from `rdf_deleted_data`. This is a mistake because the
triples from `rdf_unlinked_shared_data` might still be relevant for
other entities. We now simply keep them, even it that means that may end
up as orphaned triples at some time. The documentation of the Wikidata
update stream explicitly allows (and even encourages) this behavior.
Fixes ad-freiburg/qlever#2670
@hannahbast hannahbast merged commit e69994e into main Jan 31, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

psv: and pqv: nodes are randomly dropped

2 participants