Skip to content

Use per-country ADM level names for disambiguation labels #102

@benoit74

Description

@benoit74

With #101, ADM_LEVEL_NAMES in processor.py maps GeoNames feature codes to generic English names (ADM1region, ADM2department, etc.). These are used when two places share the same name and are in an ancestor/descendant relationship, producing labels like "Rumilly (district)" vs "Rumilly (city)".

The problem is that ADM levels are country-specific: ADM1 is a "state" in the US, a "region" in France, a "Land" in Germany, a "province" in Canada, etc. The generic fallback names are misleading for most countries.

GeoNames originally definition is probably intentionally way more blurry: https://www.geonames.org/export/codes.html

Proposed solution

Replace the single flat dict with a bundled JSON asset (src/maps2zim/assets/adm_level_names.json) structured as:

{
  "_default": {"ADM1": "region", "ADM2": "department", "ADM3": "district", "ADM4": "city"},
  "US": {"ADM1": "state", "ADM2": "county", "ADM3": "city"},
  "DE": {"ADM1": "state", "ADM2": "district", "ADM3": "municipality"},
  ...
}

_compute_discriminating_labels looks up the place's country_code first, falling back to _default.

Data sourcing

No off-the-shelf file is known to exist. The initial dataset should cover the most frequent countries in the GeoNames data and can be extended over time. Wikidata SPARQL or the OpenStreetMap wiki (which documents per-country admin levels) are the best reference sources for curating the initial content.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions