Skip to content

Index / add Swedish (swe) to Elasticsearch records mapping#9355

Open
juanluisrp wants to merge 1 commit into
geonetwork:mainfrom
GeoCat:swe-elastic-lang
Open

Index / add Swedish (swe) to Elasticsearch records mapping#9355
juanluisrp wants to merge 1 commit into
geonetwork:mainfrom
GeoCat:swe-elastic-lang

Conversation

@juanluisrp

Copy link
Copy Markdown
Contributor

What this does

Adds Swedish (swe) to the Elasticsearch index mapping template records.json.

Swedish is registered as a supported UI language in CatController.js, but it was the only such language with no explicit field definitions in records.json. When the UI was switched to Swedish, facet aggregations ran against langswe fields that Elasticsearch had created through dynamic mapping as text fields. Text fields can't be aggregated, so the request failed with HTTP 400 and the search page displayed an error:

illegal_argument_exception: Text fields are not optimised for operations
that require per-document field data like aggregations and sorting.
Please use a keyword field instead. Alternatively, set fielddata=true
on [tag.langswe].

langswe is now added to every relevant section, mirroring the existing languages:

  • keyword for the aggregatable fields and their dynamic templates (organisationName, tag / th_*, any)
  • text with the built-in swedish analyzer for the full-text fields (*Object template and any)

This matches how german, danish, italian, spanish, romanian and portuguese are handled (built-in language analyzer; only english/french use custom _rebuilt analyzers).

Verification

Tested against Elasticsearch 8.19:

  • Confirmed the built-in swedish analyzer exists and stems/stops correctly.
  • A keyword-mapped tag.langswe field aggregates correctly.
  • A dynamically-mapped (text) tag.langswe reproduces the reported 400.
  • A text field with analyzer: swedish is accepted.

Existing indexes need to be rebuilt for the new mapping to take effect.

Closes #9243

Swedish was registered as a supported UI language in CatController.js
but had no field definitions in the index mapping template
records.json, unlike all other supported languages. Switching the UI
to Swedish triggered facet aggregations on langswe fields, which
Elasticsearch had created via dynamic mapping as text fields. Text
fields are not optimised for aggregations, so the request failed with
HTTP 400 and the search page showed an error.

Add langswe to every relevant section of records.json, mirroring the
existing languages: keyword for the aggregatable fields (tag, any,
organisationName and their dynamic templates) and text with the
built-in swedish analyzer for the full-text fields.

Closes geonetwork#9243
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement index structure change Indicate that this work introduces an index change.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Swedish language (swe) missing from Elasticsearch index mapping template records.json

1 participant