Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[i18n] Add all hierarchy names #492

Closed
wants to merge 1 commit into from

Conversation

Joxit
Copy link
Member

@Joxit Joxit commented Mar 25, 2020

Background

With pelias/api#1300 I added some cool feature for i18n autocomplete. But there are some limitations, such as the search of Parijs, Frankrijk, due to hierarchy index limitation.
Since we have #446 and the support for multi-lang names property in OSM, we can search for both OSM venues/addresses and WOF by their non-default/English name.

This means that if we do an autocomplete of Parijs (with lang=nl and pelias/api#1300) we will get Paris 👍 but if you do an autocomplete of Parijs, Frankrijk we will get no results 👎.

Why ?

Because the API will search a document with a name property Parijs AND in its hierarchy (country, locality...) Frankrijk. But in our ES index, we only have the default name in document hierarchy. So Frankrijk is not in the Parijs hierarchy... Only France is present.

How to get rid of this ?

We do not have many solutions... I found only 2.

  1. We add alternative names in our hierarchy to takes advantage of ES fuzzy search when we autocomplete things.
  2. We do 2 requests, my example is for the search of Pont Neuf, Parijs, Frankrijk:
    1. The first one is searching ES admin information, something like source=wof AND (name.lang='Parijs' or name.lang='Frankrijk') and take their ids.
    2. The second one is searching in ES Pont Neuf which is in the hierarchy found lately, something like name.lang='Pont Neuf' AND (country.id IN ('whosonfirst:country:85633147', 'whosonfirst:locality:101751119') OR locality IN ('whosonfirst:country:85633147', 'whosonfirst:locality:101751119'))

In this PR I tried the first solution and here are my results for France:

All names in parent.placetype

First try with all names in parent.placetype without duplicates.

master PR
Index Size 185M 1.1G
Index Time 86s 228s

This first test is not the right one in my opinion. I will try some alternatives 😄

related: pelias/api#1296
related: pelias/api#1300

@Joxit Joxit force-pushed the joxit/feat/multi-lang-hierarchy branch from 45f3d63 to 37e3698 Compare January 20, 2021 15:34
@Joxit
Copy link
Member Author

Joxit commented Jan 21, 2021

Ok, I did some test yesterday with a full WOF only planet import, this is not convincing

master PR
Size 3,2G 48G
Time 41m6,438s 336m20,755s

I did another test today, save multi-lang names in a field ${placetype}_names but this causes an error

[400] type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded

Defining too many fields in an index can lead to a mapping explosion, which can cause out of memory errors and difficult situations to recover from.

Consider a situation where every new document inserted introduces new fields, such as with dynamic mapping. Each new field is added to the index mapping, which can become a problem as the mapping grows.

From elastic documentation

Basically, this is not a good idea, so I will close this PR and find another solution

@Joxit Joxit closed this Jan 21, 2021
@Joxit Joxit deleted the joxit/feat/multi-lang-hierarchy branch January 21, 2021 11:15
@orangejulius
Copy link
Member

Ah yeah, adding this many new names probably won't work. There are also issues with field length that start happening with aliases (pelias/pelias#862).

HOWEVER! It might be interesting for you to give it a shot on Elasticsearch 7.10: it has both faster indexing and more space efficient indices. It probably won't make much of a difference but I'm sure happy to be surprised. :)

@Joxit
Copy link
Member Author

Joxit commented Jan 21, 2021

Yes you're right 😞

I did my tests with Elasticsearch 7.10 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants