Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions apps/server/src/db.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import Database from "better-sqlite3";
import path from "path";
import { promises as fs } from "fs";
import { DEFAULT_SETTINGS } from "@crocdesk/shared";

Check warning on line 4 in apps/server/src/db.ts

View workflow job for this annotation

GitHub Actions / lint

'DEFAULT_SETTINGS' is defined but never used. Allowed unused vars must match /^_/u
import type {
JobRecord,
JobStatus,
Expand Down Expand Up @@ -54,6 +54,18 @@
updated_at INTEGER NOT NULL
);

CREATE TABLE IF NOT EXISTS wikidata_cache_search (
query_hash TEXT PRIMARY KEY,
response_json TEXT NOT NULL,
updated_at INTEGER NOT NULL
);

CREATE TABLE IF NOT EXISTS wikidata_cache_game (
qid TEXT PRIMARY KEY,
response_json TEXT NOT NULL,
updated_at INTEGER NOT NULL
);

CREATE TABLE IF NOT EXISTS library_items (
id INTEGER PRIMARY KEY AUTOINCREMENT,
path TEXT NOT NULL UNIQUE,
Expand Down Expand Up @@ -310,3 +322,37 @@
)
.run(slug, json, Date.now());
}

// Wikidata cache functions

export function getCachedWikidataSearch(queryHash: string): { json: string; updatedAt: number } | null {
const row = getDb()
.prepare("SELECT response_json as json, updated_at as updatedAt FROM wikidata_cache_search WHERE query_hash = ?")
.get(queryHash) as { json: string; updatedAt: number } | undefined;
return row ?? null;
}

export function setCachedWikidataSearch(queryHash: string, json: string): void {
getDb()
.prepare(
"INSERT INTO wikidata_cache_search (query_hash, response_json, updated_at) VALUES (?, ?, ?) " +
"ON CONFLICT(query_hash) DO UPDATE SET response_json = excluded.response_json, updated_at = excluded.updated_at"
)
.run(queryHash, json, Date.now());
}

export function getCachedWikidataGame(qid: string): { json: string; updatedAt: number } | null {
const row = getDb()
.prepare("SELECT response_json as json, updated_at as updatedAt FROM wikidata_cache_game WHERE qid = ?")
.get(qid) as { json: string; updatedAt: number } | undefined;
return row ?? null;
}

export function setCachedWikidataGame(qid: string, json: string): void {
getDb()
.prepare(
"INSERT INTO wikidata_cache_game (qid, response_json, updated_at) VALUES (?, ?, ?) " +
"ON CONFLICT(qid) DO UPDATE SET response_json = excluded.response_json, updated_at = excluded.updated_at"
)
.run(qid, json, Date.now());
}
224 changes: 224 additions & 0 deletions apps/server/src/providers/wikidata/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
# Wikidata Metadata Provider

A free, open metadata provider for video games using Wikidata's SPARQL endpoint.

## Features

- ✅ **No API Key Required** - Free and open access to Wikidata
- ✅ **Rich Metadata** - Title, platforms, genres, publishers, series, release date
- ✅ **Intelligent Matching** - Name normalization and ranking with platform boost
- ✅ **Aggressive Caching** - 21-day TTL for search and game results
- ✅ **Rate Limiting** - Configurable rate limit (default: 1 req/sec)
- ✅ **Offline Support** - Works from cache when network unavailable

## Architecture

```
WikidataProvider
├── client.ts - HTTP client with rate limiting
├── queryBuilder.ts - SPARQL query templates
├── mapper.ts - SPARQL → GameMetadata transformation
├── normalizer.ts - Name normalization and result ranking
└── provider.ts - MetadataProvider implementation
```

## Usage

### Basic Search

```typescript
import { WikidataProvider } from './providers/wikidata';

const provider = new WikidataProvider();

// Search for games
const games = await provider.searchGames("super mario");

console.log(games[0]);
// {
// source: "wikidata",
// sourceId: "Q12345",
// name: "Super Mario Bros.",
// releaseDate: "1985-09-13",
// platforms: ["Nintendo Entertainment System"],
// genres: ["platform game"],
// publishers: ["Nintendo"]
// }
```

### Search with Platform Filter

```typescript
// Boost results matching the specified platform
const games = await provider.searchGames("mario", {
platform: "nes",
limit: 10
});
```

### Get Game by QID

```typescript
const game = await provider.getGameById("Q12345");

if (game) {
console.log(game.name); // "Super Mario Bros."
}
```

### Health Check

```typescript
const health = await provider.healthCheck();

if (health.healthy) {
console.log(`Wikidata is available (${health.responseTime}ms)`);
}
```

### Custom Client Options

```typescript
import { WikidataProvider, WikidataClient } from './providers/wikidata';

const client = new WikidataClient({
rateLimitMs: 500, // Faster rate limit
userAgent: "MyApp/1.0"
});

const provider = new WikidataProvider(client);
```

## Caching

The provider uses two cache tables in SQLite:

- `wikidata_cache_search` - Caches search results by normalized query hash
- `wikidata_cache_game` - Caches individual game metadata by QID

Cache TTL is 21 days by default. Cache is checked first before making HTTP requests.

## Name Normalization

The normalizer strips common ROM naming conventions for better matching:

- Region tags: `(USA)`, `[Europe]`, `(Japan)`
- Revision tags: `(Rev 1)`, `[Rev A]`
- Disc numbers: `(Disc 1)`, `(Disc 2)`
- Extra whitespace and punctuation

## Result Ranking

Results are ranked by:

1. **Match Quality** - EXACT > PREFIX > CONTAINS > NO_MATCH
2. **Platform Boost** - Games matching the platform filter rank 0.5 higher
3. **Alphabetical** - Tiebreaker for equal ranks

## Rate Limiting

The client enforces rate limiting to be respectful of Wikidata's resources:

- Default: 1 request per second
- Configurable via `WikidataClient` options
- Queued requests wait for rate limit

## SPARQL Queries

### Search Query

Searches for video games matching the normalized query string:

```sparql
SELECT DISTINCT ?game ?gameLabel ?releaseDate
(GROUP_CONCAT(DISTINCT ?platformLabel; separator="|") AS ?platforms)
(GROUP_CONCAT(DISTINCT ?genreLabel; separator="|") AS ?genres)
(GROUP_CONCAT(DISTINCT ?publisherLabel; separator="|") AS ?publishers)
?seriesLabel
WHERE {
?game wdt:P31/wdt:P279* wd:Q7889 . # instance of video game
FILTER(CONTAINS(LCASE(?gameLabel), "query"))
# ... optional metadata fields
}
GROUP BY ?game ?gameLabel ?releaseDate ?seriesLabel
LIMIT 25
```

### Get by QID Query

Fetches full metadata for a specific game by Wikidata QID:

```sparql
SELECT DISTINCT ?game ?gameLabel ?releaseDate ...
WHERE {
BIND(wd:Q12345 AS ?game)
?game wdt:P31/wdt:P279* wd:Q7889 . # validate it's a video game
# ... optional metadata fields
}
```

## Testing

The provider has comprehensive test coverage:

- **queryBuilder.test.ts** - 12 tests for SPARQL query generation
- **mapper.test.ts** - 14 tests for SPARQL → GameMetadata mapping
- **normalizer.test.ts** - 22 tests for normalization and ranking
- **client.test.ts** - 9 tests for HTTP client and rate limiting
- **provider.test.ts** - 9 tests for provider integration

Run tests:

```bash
npm run test:unit -- wikidata
```

## Data Model

### GameMetadata

```typescript
type GameMetadata = {
source: "wikidata";
sourceId: string; // QID (e.g., "Q12345")
name: string; // Game title
releaseDate?: string; // ISO 8601 date
platforms?: string[]; // Platform names
genres?: string[]; // Genre names
publishers?: string[]; // Publisher names
series?: string; // Game series name
raw?: unknown; // Original WikidataGameResult
};
```

### WikidataGameResult

```typescript
type WikidataGameResult = {
qid: string;
label: string;
releaseDate?: string;
platforms?: string[];
genres?: string[];
publishers?: string[];
series?: string;
};
```

## Known Limitations

- **Wikidata Coverage** - Not all games are in Wikidata
- **English Only** - Only English labels are fetched
- **Platform Names** - May differ from ROM naming conventions
- **Rate Limits** - Respect Wikidata's rate limits

## Future Enhancements

- [ ] Multi-language support
- [ ] Platform name mapping/aliases
- [ ] Additional metadata fields (developers, modes, ratings)
- [ ] Fallback to other providers when not found

## License

This provider is part of Jacare and follows the same license.
Loading
Loading