Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FuzzyCompletions support to limit number of results #212

Open
subramanyam86 opened this issue Feb 15, 2017 · 3 comments
Open

FuzzyCompletions support to limit number of results #212

subramanyam86 opened this issue Feb 15, 2017 · 3 comments

Comments

@subramanyam86
Copy link

It would be nice for the FuzzyCompletions api to support an additional parameter to limit number of results. This could speed up fuzzy completions on shorter queries.

@hendrikmuhs
Copy link
Contributor

hendrikmuhs commented Feb 15, 2017

FuzzyCompletions is doing a depth-first search approach, not necessarily finding the best completions first like it is the case for prefix completion. Thats also why there is not API for that on the CPP implementation like for prefix completion. So its not as easy as it seems.

Having that said there are a couple of things possible to speed it up:

But I think the most important thing to look at would be an API that calculates scores based on the combination of edit distance and dictionary weight. This function has to be given from the outside. Given such a function it may be possible to implement your original suggestion if this API can also return upper bounds for edit distances.

Last bot not least, referring to 'fuzzy completions on shorter queries': Another good strategy is to have a lookup table on the caller side limiting the max-edit distance parameter based on input length, e.g. for lengh < 5: max-editdistance = 1, length < 8: max-editdistance = 2, length > 8, max-editdistance = 3 etc (this is more or less the reverse of the API outlined above).

The current fuzzy matching implementation is just at the very beginning of what is possible.

@subu-cliqz
Copy link
Contributor

@hendrikmuhs Thanks for the detailed explanation Hendrik! I am implementing the client side logic as you explained already. The issue #56 sounds interesting, will speak with Narek about implementing this. Thanks again!

@hendrikmuhs
Copy link
Contributor

fixed: it is depth-first, not breadth first. Sorry, just realized, was a bit sleepy writing this.

The main reason for depth-first is the very small memorization needed when traversing the structure. Breadth-first would require lots of memory and therefore would be problematic in terms of scale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants