-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
OpenScan Metadata Integration Plan — Kleros Mainnet v1
Goal
Integrate Kleros dataset (kleros-mainnet-v1.zip) into explorer-metadata in a safe, auditable, and repeatable way.
Source Snapshot
- File:
kleros-mainnet-v1.zip - Network: Ethereum Mainnet (
chainId: 1) - Folders:
Mainnet/batched_tags/(bulk address tags)Mainnet/Single_tags/(consolidated address tags)Mainnet/Tokens/(token metadata)
Target Mapping (explorer-metadata)
1) Address tags
From: batched_tags + Single_tags
To: data/addresses/evm/1/{address}.json
Field mapping (draft):
Address→addressChain ID→chainIdNametag→labelWebsite→ candidate forlinksor auxiliary metadataPublic Note→note(primary target) /descriptionfallback (after sanitization)
2) Token metadata
From: Tokens
To: data/tokens/evm/1/{address}.json
Field mapping (draft):
contract address→addresstoken name→namesymbol→symbolproject name→project.name(if available)- social/web fields (
github,discord,x(twitter), etc.) →links[](normalized)
Data Quality Rules
Normalization
- Canonicalize all addresses (checksum format for output, lowercase for internal comparisons).
- Trim whitespace and remove empty strings.
- Enforce
chainId = 1for all imported records.
Deduplication
- Address records: key =
(chainId, address) - Token records: key =
(chainId, contract address) - If duplicates exist in source, choose the “best” row by completeness score.
Sanitization
- Strip unsafe HTML from
Public Note. - Preserve readable plaintext in
note(do not drop content unless empty/invalid). - Validate URLs before adding to
links. - Ignore malformed social handles/URLs.
Merge Policy (Important)
When target record already exists:
- Never overwrite high-confidence existing metadata blindly.
- Prefer existing curated values unless Kleros provides strictly better/non-empty data.
- Keep deterministic merge rules to avoid noisy PR diffs.
Proposed precedence:
- Existing curated repo value > New Kleros value
- Exception: fill missing/null target fields from Kleros
Provenance / Auditability
Add source trace for imported fields/records (exact format to align with repo conventions), e.g.:
- provider:
kleros - snapshot date:
2026-02-23 - import batch id/hash
This enables rollback and future diffs.
Implementation Workplan
Phase 0 — Discovery
- Clone/open local
explorer-metadatarepo - Inspect schemas (
schemas/) and examples for addresses/tokens - Confirm accepted optional fields and provenance convention
Phase 1 — Import script (dry-run first)
- Create
scripts/import-kleros-mainnet.(ts|js) - Parse ZIP/CSV inputs
- Normalize + deduplicate
- Build output objects matching schema
- Produce dry-run report:
- new files
- updated files
- skipped/invalid
- conflicts requiring manual review
Phase 2 — Apply + validate
- Run importer in apply mode
- Run repo validation (
npm run validate) - Run lint/format if required
- Spot-check random samples (tags + tokens)
Phase 3 — PR strategy
- PR Publish to NPM #1: address tags only (lower risk)
- PR CI: Enable releases with Github Actions #2: tokens/social links enrichment
- Include import report and merge policy in PR description
Acceptance Criteria
- All generated files pass schema validation.
- No duplicate addresses/tokens in target paths.
- Deterministic output (same input → same diff).
- Provenance included for imported content.
- Dry-run and apply modes both available.
Risks / Open Questions
- How should
Public Notebe represented in official schema (if at all)? - Should
Websiteand social fields be normalized intolinks[]or omitted if low confidence? - Do we store provenance inline per record or in external import manifest?
- Should
Single_tagshave higher priority thanbatched_tagswhen conflicts occur?
Immediate Next Step
Define exact schema-compliant JSON shape for:
data/addresses/evm/1/{address}.jsondata/tokens/evm/1/{address}.json
Then lock merge rules and implement dry-run importer.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels