What is the difference between Index and Dictionary ?? #227
-
What is the difference between Index and Dictionary ?? It is not clear from the docs !! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
It started with dictionaries:
A dictionary is a static, immutable container, that means it can not be updated once it is written. You get fast access and compact size, for the price of being not update-able. To overcome this limitation an index is nothing else than a stack of dictionaries. Think of 1 big dictionary and a small dictionary which contains the update. At lookup you first check the small dictionary and if you don't find the answer there, you look into the big dictionary.
In an index you can create, update, delete entries any time. In the background the index creates dictionaries and maintains a stack. But the index does another important thing, it merges dictionaries in the background. This reduces the size and the number of inner lookups. If you stop writing to an index you will end up with 1 dictionary per index. If you want to read/learn more about this, you can look for content that describes how Lucene works. In the Lucene world you have segments, segments are the counterpart of dictionaries in keyvi from a storage perspective(Otherwise the comparison is not quite accurate, because segments store more than key value pairs). For static datasets there are still use cases to use the dictionary directly. Actually most users of keyvi still use dictionaries and not the index. That might be historical reasons, as said, it started with dictionaries. Another reason are features, there are features that are not available for indexes, but only for dictionaries, e.g. some approximate matching use cases. However, if you look at the open PR's, you will see, we are working on bringing this functionality to indexes. Even if indexes are able to do everything a dictionary is able to do one day, dictionaries have its place and I would not want to hide them. LBNL the quality of docs: You are absolutely right, docs are a weak point. It's an open source project and time is limited. |
Beta Was this translation helpful? Give feedback.
It started with dictionaries:
A dictionary is a static, immutable container, that means it can not be updated once it is written. You get fast access and compact size, for the price of being not update-able.
To overcome this limitation an index is nothing else than a stack of dictionaries. Think of 1 big dictionary and a small dictionary which contains the update. At lookup you first check the small dictionary and if you don't find the answer there, you look into the big dictionary.
In an index you can create, update, delete entri…