-
Notifications
You must be signed in to change notification settings - Fork 9
Home
This wiki is the user manual of Byblo, a software tools for producing distributional thesauri. The goal is to give an overview of the softwares structure and capabilities, and to give assistance on correct usage.
It is outside the scope of this document to give any information as to the application of a generated thesaurus, or to teach theory such as the Distribution Hypothesis (Harris, 1954). It is assumed that the reader has a grasp of these background topics, and also a moderate grounding in Java programming, unix command line, and linear algebra.
The software has been designed with performance and flexibility in mind. It is capable of producing all known types of distributional thesaurus, with perhaps some minimal pre-processing. With respect to performance, the software is highly optimised but not to the detriment of flexibility. Greater performance can be achieved on specific tasks using a tailored heuristic approach such as that employed by Bayardo et al (2007).