Skip to content

Plan: Integrate Kleros Mainnet v1 dataset into explorer-metadata #10

@josealoha666

Description

@josealoha666

OpenScan Metadata Integration Plan — Kleros Mainnet v1

Goal

Integrate Kleros dataset (kleros-mainnet-v1.zip) into explorer-metadata in a safe, auditable, and repeatable way.


Source Snapshot

  • File: kleros-mainnet-v1.zip
  • Network: Ethereum Mainnet (chainId: 1)
  • Folders:
    • Mainnet/batched_tags/ (bulk address tags)
    • Mainnet/Single_tags/ (consolidated address tags)
    • Mainnet/Tokens/ (token metadata)

Target Mapping (explorer-metadata)

1) Address tags

From: batched_tags + Single_tags

To: data/addresses/evm/1/{address}.json

Field mapping (draft):

  • Addressaddress
  • Chain IDchainId
  • Nametaglabel
  • Website → candidate for links or auxiliary metadata
  • Public Notenote (primary target) / description fallback (after sanitization)

2) Token metadata

From: Tokens

To: data/tokens/evm/1/{address}.json

Field mapping (draft):

  • contract addressaddress
  • token namename
  • symbolsymbol
  • project nameproject.name (if available)
  • social/web fields (github, discord, x(twitter), etc.) → links[] (normalized)

Data Quality Rules

Normalization

  • Canonicalize all addresses (checksum format for output, lowercase for internal comparisons).
  • Trim whitespace and remove empty strings.
  • Enforce chainId = 1 for all imported records.

Deduplication

  • Address records: key = (chainId, address)
  • Token records: key = (chainId, contract address)
  • If duplicates exist in source, choose the “best” row by completeness score.

Sanitization

  • Strip unsafe HTML from Public Note.
  • Preserve readable plaintext in note (do not drop content unless empty/invalid).
  • Validate URLs before adding to links.
  • Ignore malformed social handles/URLs.

Merge Policy (Important)

When target record already exists:

  1. Never overwrite high-confidence existing metadata blindly.
  2. Prefer existing curated values unless Kleros provides strictly better/non-empty data.
  3. Keep deterministic merge rules to avoid noisy PR diffs.

Proposed precedence:

  • Existing curated repo value > New Kleros value
  • Exception: fill missing/null target fields from Kleros

Provenance / Auditability

Add source trace for imported fields/records (exact format to align with repo conventions), e.g.:

  • provider: kleros
  • snapshot date: 2026-02-23
  • import batch id/hash

This enables rollback and future diffs.


Implementation Workplan

Phase 0 — Discovery

  • Clone/open local explorer-metadata repo
  • Inspect schemas (schemas/) and examples for addresses/tokens
  • Confirm accepted optional fields and provenance convention

Phase 1 — Import script (dry-run first)

  • Create scripts/import-kleros-mainnet.(ts|js)
  • Parse ZIP/CSV inputs
  • Normalize + deduplicate
  • Build output objects matching schema
  • Produce dry-run report:
    • new files
    • updated files
    • skipped/invalid
    • conflicts requiring manual review

Phase 2 — Apply + validate

  • Run importer in apply mode
  • Run repo validation (npm run validate)
  • Run lint/format if required
  • Spot-check random samples (tags + tokens)

Phase 3 — PR strategy


Acceptance Criteria

  • All generated files pass schema validation.
  • No duplicate addresses/tokens in target paths.
  • Deterministic output (same input → same diff).
  • Provenance included for imported content.
  • Dry-run and apply modes both available.

Risks / Open Questions

  • How should Public Note be represented in official schema (if at all)?
  • Should Website and social fields be normalized into links[] or omitted if low confidence?
  • Do we store provenance inline per record or in external import manifest?
  • Should Single_tags have higher priority than batched_tags when conflicts occur?

Immediate Next Step

Define exact schema-compliant JSON shape for:

  1. data/addresses/evm/1/{address}.json
  2. data/tokens/evm/1/{address}.json

Then lock merge rules and implement dry-run importer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions