FuzzyCompletions support to limit number of results #212

subramanyam86 · 2017-02-15T12:50:17Z

It would be nice for the FuzzyCompletions api to support an additional parameter to limit number of results. This could speed up fuzzy completions on shorter queries.

hendrikmuhs · 2017-02-15T15:12:25Z

FuzzyCompletions is doing a depth-first search approach, not necessarily finding the best completions first like it is the case for prefix completion. Thats also why there is not API for that on the CPP implementation like for prefix completion. So its not as easy as it seems.

Having that said there are a couple of things possible to speed it up:

Issue Rewrite BoundedWeightedStateTraverser(used by Completions) to the the traversal templates #56 would give a massive speed improvement. Guesstimate, based on the impact for dumping: at least double the speed.
Use a exact first traversal approach: This is e.g. done in GetNear for geospatial lookups, it prefers the exact match before any fuzzy matches, trying to avoid edit distance penalties

But I think the most important thing to look at would be an API that calculates scores based on the combination of edit distance and dictionary weight. This function has to be given from the outside. Given such a function it may be possible to implement your original suggestion if this API can also return upper bounds for edit distances.

Last bot not least, referring to 'fuzzy completions on shorter queries': Another good strategy is to have a lookup table on the caller side limiting the max-edit distance parameter based on input length, e.g. for lengh < 5: max-editdistance = 1, length < 8: max-editdistance = 2, length > 8, max-editdistance = 3 etc (this is more or less the reverse of the API outlined above).

The current fuzzy matching implementation is just at the very beginning of what is possible.

subu-cliqz · 2017-02-15T15:37:52Z

@hendrikmuhs Thanks for the detailed explanation Hendrik! I am implementing the client side logic as you explained already. The issue #56 sounds interesting, will speak with Narek about implementing this. Thanks again!

hendrikmuhs · 2017-02-15T19:31:30Z

fixed: it is depth-first, not breadth first. Sorry, just realized, was a bit sleepy writing this.

The main reason for depth-first is the very small memorization needed when traversing the structure. Breadth-first would require lots of memory and therefore would be problematic in terms of scale.

narekgharibyan added the enhancement label Feb 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FuzzyCompletions support to limit number of results #212

FuzzyCompletions support to limit number of results #212

subramanyam86 commented Feb 15, 2017

hendrikmuhs commented Feb 15, 2017 •

edited

Loading

subu-cliqz commented Feb 15, 2017

hendrikmuhs commented Feb 15, 2017

FuzzyCompletions support to limit number of results #212

FuzzyCompletions support to limit number of results #212

Comments

subramanyam86 commented Feb 15, 2017

hendrikmuhs commented Feb 15, 2017 • edited Loading

subu-cliqz commented Feb 15, 2017

hendrikmuhs commented Feb 15, 2017

hendrikmuhs commented Feb 15, 2017 •

edited

Loading