Search: Add approximate string matching, a.k.a. fuzzy search #1537
Labels
idea
module:backend
MapService stuff
plugin:search
Functionality and features of the (core) Search plugin
Milestone
I'm posting this as an idea – it's not sure that it'll be implemented like this, but I'd want to discuss it, as well as when it's a "good time" to add this functionality.
Overview
One common issue when searching in Hajk is that the user’s input must exactly match a part of the result string to generate a match. Despite using wildcards and implementing some sophisticated string splitting, the results remain unsatisfactory. In fact, achieving better results might not be feasible using the WFS protocol.
I've been experimenting with various fuzzy search methods, including trigram matching. With trigram matching, I achieved a functionality that could be useful for populating the autocomplete list, aiding users in typing correctly.
The proposed solution utilizes Postgres's
pg_trgm
extension, which is included in a standard installation. The DBA only needs to execute a quickCREATE EXTENSION pg_trgm
in the given database.It is also crucial for the DBA to create GiST indexes on the columns used in the search operation. My tests show that with proper indexing, a search on three columns within a table with approximately 86,000 rows can take as little as 12-15 ms.
What needs to be changed
Backend
pg
package`) is added.env
gains some new keys that specify the Postgres connection details (the usual stuff such ashost
,port
,database
,user
etc.pg
client part in a service of its own.Client
Map config is extended with a section that configures the autocomplete functionality. The things we'll need is what is to be sent from the Client to the Backend, as a JSON body. Here's a description of how I imaging this object:
Search is rewritten to use results from the new endpoint, either exclusively or in conjunction with the current results from the WFS and
DocumentHandler
sources. Currently, I focus primarily on the endpoint part, creating a flexible, easy to configure solution in the backend.So, given the setup described above, the API can have a new endpoint that will respond with meaningful results even when user mistypes the query:
Skarminspelning.2024-07-09.kl.09.13.02.mov
The text was updated successfully, but these errors were encountered: