Skip to content

Add Wikidata SPARQL metadata provider#89

Merged
luandev merged 9 commits intomainfrom
copilot/add-wikidata-metadata-provider
Jan 9, 2026
Merged

Add Wikidata SPARQL metadata provider#89
luandev merged 9 commits intomainfrom
copilot/add-wikidata-metadata-provider

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 8, 2026

Summary

  • Implements a free, API-key-less metadata provider using Wikidata's SPARQL endpoint with intelligent name normalization, result ranking, and aggressive caching (21-day TTL)

Implementation

Core Provider (apps/server/src/providers/wikidata/)

  • provider.ts - MetadataProvider interface implementation with dual-layer caching
  • client.ts - Rate-limited HTTP client (1 req/sec default, configurable)
  • queryBuilder.ts - SPARQL query templates for search and QID lookup
  • mapper.ts - SPARQL JSON → GameMetadata transformation
  • normalizer.ts - ROM name normalization (strips region tags, revisions, disc numbers) + ranking (EXACT > PREFIX > CONTAINS with platform boost)

Type System (packages/shared/src/)

  • metadata-types.ts - Generic MetadataProvider interface and GameMetadata type
  • wikidata-types.ts - Wikidata SPARQL response types

Database (apps/server/src/db.ts)

  • Added wikidata_cache_search and wikidata_cache_game tables
  • Cache get/set functions with 21-day TTL

Usage

import { WikidataProvider } from './providers/wikidata';

const provider = new WikidataProvider();

// Search with platform filter
const games = await provider.searchGames("super mario", {
  platform: "nes",
  limit: 10
});

// Get by QID
const game = await provider.getGameById("Q12345");

Testing

  • Added 66 tests across 5 test files (queryBuilder: 12, mapper: 14, normalizer: 22, client: 9, provider: 9)
  • All 148 tests passing
  • Type checking and linting pass
  • CodeQL security scan: 0 vulnerabilities

Checklist

  • Added a semantic version comment to this PR using /semver: patch, /semver: minor, or /semver: major. (See template for examples)
  • Confirmed workflows and automation updates (if any) have appropriate permissions.
Original prompt

This section details on the original issue you should resolve

<issue_title>Wikidata (SPARQL) metadata provider</issue_title>
<issue_description>

Why

  • Free, open, no API key
  • Rich metadata (title, platforms, genres, release date, publisher, series)
  • Suitable fallback / primary metadata source when APIs are unavailable

Scope (MVP)

  • Search games by normalized name
  • Fetch core metadata
  • Rank + match to local ROMs
  • Cache aggressively

Architecture

Provider interface

Implement MetadataProvider:

  • searchGames(query, opts)
  • getGameByQid(qid)
  • healthCheck()

Provider module

src/providers/wikidata/

  • queryBuilder.ts — SPARQL templates
  • client.ts — HTTP + rate-limit handling
  • mapper.ts — SPARQL → GameMetadata
  • provider.ts — interface implementation

Data source

  • Endpoint: https://query.wikidata.org/sparql
  • Type filter: instance of video game (Q7889)
  • Labels: English only

Queries

Search by name

  • CONTAINS(LCASE(?label), <normalized title>)
  • Limit 25
  • Fields:
    • QID
    • label
    • release date
    • platforms
    • genres
    • publishers
    • series (optional)

Fetch by QID

  • Exact QID lookup for full metadata

Normalization & matching

  • Strip region/revision tags: (USA), [Rev A], Disc 1
  • Lowercase, ASCII fold
  • Ranking:
    1. Exact label match
    2. Prefix match
    3. Contains match
    4. Platform boost

Caching

  • Keys:
    • wikidata:search:{query}
    • wikidata:game:{qid}
  • TTL: 14–30 days
  • Always serve cache first
  • Hard rate limit (e.g. 1 req/sec)

Data model (internal)

GameMetadata:

  • source: "wikidata"
  • sourceId: QID
  • name
  • releaseDate?
  • platforms?
  • genres?
  • publishers?
  • series?
  • raw?

Testing

  • Snapshot tests for query builder
  • Mapper tests with recorded SPARQL responses
  • Matching/ranking unit tests

Risks & mitigations

  • Loose matching → ranking + manual override
  • Rate limits → cache + throttle
  • Incomplete data → allow multiple providers later

Done when

  • Name search returns ranked candidates
  • Selected match enriches local library
  • Works offline from cache
  • Zero secrets / API keys required
</issue_description>

<agent_instructions>Implement this using TDD to the highest level of code quality and production level code</agent_instructions>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 4 commits January 8, 2026 18:33
…zer with tests

Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>
Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>
Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>
… checks

Co-authored-by: luandev <6452989+luandev@users.noreply.github.com>
Copilot AI changed the title [WIP] Add Wikidata SPARQL metadata provider implementation Add Wikidata SPARQL metadata provider Jan 8, 2026
Copilot AI requested a review from luandev January 8, 2026 18:43
@luandev luandev marked this pull request as ready for review January 8, 2026 18:46
Copilot AI review requested due to automatic review settings January 8, 2026 18:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a free, API-key-less metadata provider using Wikidata's SPARQL endpoint. The implementation follows a clean architecture with comprehensive test coverage (66 tests), intelligent name normalization, result ranking with platform boost, and aggressive caching (21-day TTL).

Key Changes:

  • New metadata provider system with generic interfaces for supporting multiple metadata sources
  • Wikidata-specific SPARQL client with rate limiting (1 req/sec default)
  • Intelligent game name normalization and result ranking
  • Database tables for dual-layer caching (search and individual game lookups)
  • Comprehensive test suite covering all modules

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
packages/shared/src/metadata-types.ts Generic MetadataProvider interface and GameMetadata type for cross-provider compatibility
packages/shared/src/wikidata-types.ts Wikidata-specific types for SPARQL responses and result ranking
packages/shared/src/index.ts Export new metadata and Wikidata types
apps/server/src/providers/wikidata/queryBuilder.ts SPARQL query templates for search and QID lookup with proper escaping
apps/server/src/providers/wikidata/client.ts HTTP client with rate limiting and health check functionality
apps/server/src/providers/wikidata/mapper.ts Transforms SPARQL JSON responses to structured GameMetadata
apps/server/src/providers/wikidata/normalizer.ts ROM name normalization and result ranking with platform matching
apps/server/src/providers/wikidata/provider.ts Main provider implementation with dual-layer caching
apps/server/src/providers/wikidata/index.ts Module exports for clean API surface
apps/server/src/providers/wikidata/README.md Comprehensive documentation with examples and architecture details
apps/server/src/providers/wikidata/tests/*.test.ts 66 tests across 5 test files covering all functionality
apps/server/src/db.ts Added wikidata_cache_search and wikidata_cache_game tables with get/set functions

vi.mock("../client");
vi.mock("../../../db");

import { WikidataClient } from "../client";
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import WikidataClient.

Suggested change
import { WikidataClient } from "../client";

Copilot uses AI. Check for mistakes.
luandev and others added 4 commits January 8, 2026 20:23
…gation

- Renamed `mapSparqlResultToGame` to `mapSparqlResultToPartialGame` for clarity, indicating it returns a partial result.
- Changed mapping logic to handle single values for platforms, genres, and publishers instead of pipe-separated strings.
- Introduced a new `mapSparqlResultToGame` function for backward compatibility, which aggregates results from the partial mapping.
- Optimized SPARQL queries by removing `GROUP_CONCAT` and simplifying label retrieval, enhancing performance and reducing timeout issues.
- Updated aggregation logic in `aggregateSparqlResults` to utilize Sets for unique values and ensure proper handling of optional fields.
… unit tests

- Introduced integration tests for the WikidataProvider to validate end-to-end functionality with real API calls.
- Updated unit tests to reflect changes in data structure, including the aggregation of results and the use of new field names (e.g., platformLabel, genreLabel).
- Enhanced test cases to ensure proper handling of optional fields and improved assertions for aggregated results.
- Adjusted SPARQL query builder tests to account for changes in query limits and filtering mechanisms.
… web projects

- Added explicit names for the node-tests and web-tests in the Vitest configuration for better clarity and organization.
- Added a blank line at the end of the integration test file for consistency.
@luandev luandev merged commit 6775d40 into main Jan 9, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wikidata (SPARQL) metadata provider

3 participants