Skip to content

Weights

eaxelson edited this page Aug 30, 2017 · 7 revisions

Weights

From end-user perspective, weight tells how probable a word or its analysis is. The weight can be thought as a penalty, i.e. words/analyses with a bigger weight are less probable. Accordingly, when there are several analyses for a word, they are printed in ascending order so that the most probable ones come first. It is possible to weight lexemes or grammatical rules, making it easier to disambiguate among several possible analyses for a given word. Of course weights can also be used in generating word forms.

For weights, we use the tropical semiring. When there are several paths or transitions that only differ by their weight, the tropical semiring chooses the one with the lowest weight. All HFST functions and transducer classes by default support weights. If weights are not specified anywhere, the functions just operate with zero weights. There are three back-end implementation formats available for almost all HFST functions: sfst, openfst-tropical and foma, openfst-tropical being the weighted one and used by default.

Using weights in regular expressions

Weights can be specified in regular expressions when using functions hfst.regex and hfst.start_xfst. The mechanism for adding weights is the :: operator which can be used for assigning weights to individual transitions or to any regular expression in brackets, i.e.

a::weight
a:b::weight
[ any regular expression ]::weight

The weights are most often from the tropical semiring. The tropical weight is represented as a float, i.e. one or more digits that may be preceded by a minus or plus sign and followed by a comma followed by at least one digit. For example the regular expression

[ a b:c::0.5 d::0.3 ]::0.2

will produce a transducer that maps abd to acd with weight 0.5 + 0.3 + 0.2 = 1.0. In this example, we basically have a transition a:a with no weight followed by a transition b:c with weight 0.5 followed by transition d:d with weight 0.3 leading to a final state with weight 0.2. However, it is possible that operations that are called afterwards, e.g. minimization, modify the exact positions of weights in the transducer.

A more complex expression

[ [ foo:bar::-1.15 ]::+0.15 baz::0.5 ]::0.7

will yield a transducer that maps foobaz to barbaz with weight -1.15 + 0.15 + 0.5 + 0.7 = 0.2.

Note that using weights is possible only when using the implementation openfst-tropical (and basically openfst-log which is not very well supported). Inserting weights with unweighted implementations, i.e. sfst or foma, has no effect.

Using weights in other functions

Tool Usage
hfst.compile_lexc Weights can be defined for individual entries and they can also used in regular expressions.
hfst.compile_twolc It may become possible to add weights to rules, which determine the relative importance of a rule in a conflict-situation. At this time it is only possible to compile weighted rules with zero weights.
hfst.fst The weight of a string can be given after the string separated by a tabulator.
hfst.read_att, hfst.AttReader, hfst.read_prolog, hfst.PrologReader In AT&T format, weights for transitions and final states can be given after the transition or final state line separated by a tabulator. In prolog format, weights can be given as last argument of compounds arc and final.

Shortcomings and caveats

There are some issues with weights that must be considered when specifying them or applying certain operations on weighted transducers. See our kitwiki pages for more information.

Clone this wiki locally