-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document parameters - what and how - for keyvi command line and python API #213
Comments
Hey @netankit you are right, all the config options lack documentation (and over time they became quite a few). This is how you do it on the cmdline: keyvicompiler -i float.txt -o float.kv_s -d json -V floating_point_precision=single (Note that you - talking about size - add compression as well: keyvicompiler -i float.txt -o float.kv_s -d json -V floating_point_precision=single -V compression=zlib ) On the python side you pass it as a dictionary: See https://github.com/cliqz-oss/keyvi/blob/master/pykeyvi/tests/json/json_dictionary_test.py#L65 cs = pykeyvi.JsonDictionaryCompiler(50000000, {'floating_point_precision': 'single'}) The first parameter is the memory limit, which has to be given in order to pass the parameter dictionary as the 2nd argument. Equivalent to above compression can be added by e.g. 'compression': 'zlib'. |
Note: The parameter parsing will change for 0.2 to make it more consistent. The memory limit which is right now an extra parameter will move into the parameter dictionary, so that all configurations are given by a python dictionary or a std::map<string, string> on the CPP side. Changing title and label. |
@hendrikmuhs Thanks for the detailed reply. I will use this for the time being. So, from v0.2 are keyvicompiler and keyviinspector completely going to be removed in favor of keyvi compile/dump? |
ah, got it. It seems the keyvi cli tool does not support parameters yet. Good point, we should add it. What I meant with 0.2 is moving memory_limit into the parameters, so the python call would look like: cs = pykeyvi.JsonDictionaryCompiler({'floating_point_precision': 'single', 'memory_limit_mb': '50'}) There are no removal plans for keyvicompiler and/or keyviinspector. The keyvi cli (based on python) is just an alternative to the native tools. Use whatever you like. The idea behind keyvi cli is faster implementation, it is much much easier to implement something in python + pykeyvi, than writing it in the cpp app. That means we will probably implement new features in keyvi cli only. But will see. |
opened #214 |
Currently, we use keyvi compiler option of "floating_point_precision" for word embeddings in sharding/compiling step. It would be nice to pass this option to command line / python api of keyvi for any keyvi file where the values will be vectors.
Ex: keyvi_compiler_options = {"minimization": "off", "floating_point_precision": "single"}
This will be helpful in reducing the size of massive keyvi files composed of vector values. (Ex. Document Vectors- ~2.1B Vectors ~ 300 Dimensions). I haven't been able to figure out how one can use this feature. A standalone example with documentation will be useful.
@hendrikmuhs
The text was updated successfully, but these errors were encountered: