Skip to content

Weights

eaxelson edited this page Oct 19, 2018 · 1 revision

Weights

From end-user perspective, weight tells how probable a word or its analysis is. The weight can be thought as a penalty, i.e. words/analyses with a bigger weight are less probable. Accordingly, when there are several analyses for a word, they are printed in ascending order so that the most probable ones come first. It is possible to weight lexemes or grammatical rules, making it easier to disambiguate among several possible analyses for a given word. Of course weights can also be used in generating word forms.

For weights, we use the tropical semiring. When there are several paths or transitions that only differ by their weight, the tropical semiring chooses the one with the lowest weight. All HFST functions and transducer classes by default support weights. If weights are not specified anywhere, the functions just operate with zero weights. There are three back-end implementation formats available for almost all HFST functions: sfst, openfst-tropical and foma, openfst-tropical being the weighted one and used by default.

Using weights in regular expressions

Weights can be specified in regular expressions when using functions hfst.regex and hfst.start_xfst. The mechanism for adding weights is the :: operator which can be used for assigning weights to individual transitions or to any regular expression in brackets, i.e.

a::weight
a:b::weight
[ any regular expression ]::weight

The weights are most often from the tropical semiring. The tropical weight is represented as a float, i.e. one or more digits that may be preceded by a minus or plus sign and followed by a comma followed by at least one digit. For example the regular expression

[ a b:c::0.5 d::0.3 ]::0.2

will produce a transducer that maps abd to acd with weight 0.5 + 0.3 + 0.2 = 1.0. In this example, we basically have a transition a:a with no weight followed by a transition b:c with weight 0.5 followed by transition d:d with weight 0.3 leading to a final state with weight 0.2. However, it is possible that operations that are called afterwards, e.g. minimization, modify the exact positions of weights in the transducer.

A more complex expression

[ [ foo:bar::-1.15 ]::+0.15 baz::0.5 ]::0.7

will yield a transducer that maps foobaz to barbaz with weight -1.15 + 0.15 + 0.5 + 0.7 = 0.2.

Note that using weights is possible only when using the implementation openfst-tropical (and basically openfst-log which is not very well supported). Inserting weights with unweighted implementations, i.e. sfst or foma, has no effect.

Using weights in other functions

Tool Usage
hfst.compile_lexc_file Weights can be defined for individual entries and they can also used in regular expressions.
hfst.compile_twolc_file It may become possible to add weights to rules, which determine the relative importance of a rule in a conflict-situation. At this time it is only possible to compile weighted rules with zero weights.
hfst.fst The weight of a string can be given after the string separated by a tabulator.
read_att_input, read_att_string, read_att_transducer, class AttReader In AT&T format, weights for transitions and final states can be given after the transition or final state line separated by a tabulator.
class PrologReader In prolog format, weights can be given as last argument of compounds arc and final.

Shortcomings and caveats

There are some issues with weights that must be considered when specifying them or applying certain operations on weighted transducers.

Negation

Negation is based on subtraction so it does not preserve weights. Usually this is not a problem but must be taken into consideration when using double negation. With unweighted transducers double negation produces an equivalent transducer, but with weighted ones the weights are lost. For example the expression

~[~[a:b::2.5]]

yields a transducer equivalent to a:b::0, not a:b::2.5.

Rules

Weights are not yet fully defined for all rule types. See command line tool hfst-xfst for examples of using weights in rules.

Containment

The containment operators $.[t] (contains t once) and $?[t] (contains t once or does not contain t) support weights. The operator $[t] (contains one or more t's) ignores weights, if you want to weight each occurrence of t with a weight of w, you can use the operator $::w[t]. Note that the containment operators use shortest matching and allow overlapping when matching the expressions. E.g.

$.[a a]

will not accept string "aaa" since it contains two strings "aa" that overlap. Similarly, the weighted expression

$::1[a a]

will accept string "aaa" but with weight 2 because there are two strings "aa", each with weight 1.

The expression

$.[a::3|[a a]::1]

will not accept string "aa" because it contains two strings "a" according to shortest matching (that does not consider weights).

Determinization and minimization

All weighted transducers are not determinizable (and thus not minimizable), so weights must be encoded when performing determinization or minimization. Encoding means that weights are regarded as a part of transition symbol pairs and cannot be freely moved. In some cases this results in minimized transducers that are not fully minimal.

For example, the determinization algorithm does not halt if given input such as

0 1 a a 1
0 2 a a 2
1 1 b b 3
1 3 c c 5
2 2 b b 4
2 3 d d 6
3 5 e e 0
3 4 e e 0
4 0 
5 0

and will eventually run out of memory (example modified from Mohri, Efficient Algorithms for Testing the Twins Property, page 8). If weights are encoded, the algorithm produces the result

0 1 a a 1
0 2 a a 2
1 1 b b 3
1 3 c c 5
2 2 b b 4
2 3 d d 6
3 4 e e 0
4 0

where weights have effectively been ignored during determinization and determinization has only been applied to the parts of the transducer that do not have weights.

You can set weight encoding on with (TODO).

Projection

When taking projection, the weights are copied as such. If an expression is divided into its input and output projections and they are combined with cross product, the weights will be duplicated. For example the expression

[a:b::5].u .x. [a:b::5].l

will yield a transducer equivalent to [a:b::10], not [a:b::5].

Another issue with projection is that it can produce weighted epsilon cycles. For example the expression

[[a:0::-1]*].l

will produce an epsilon cycle with a weight of minus one. When the epsilons are removed as part of determinization or minimization, it will result in infinite negative weights. (Positive weights are not a problem in tropical semiring because they are regarded as penalties and discarded in the first place.) See section below for more information:

Epsilon cycles with negative weights

Consider the transducer

0 0 ε ε -1
0 1 a a  0
1 0

Performing epsilon removal or determinization on it will theoretically produce the result

0 1 a a -INF
1 0

where INF is the negative infinity. Similarly, for minimization we get:

0 1 a a 0
1 -INF

In practice, INF will be the smallest float value that the environment where HFST is running supports. This is basically the correct result given the finite values of a float, but very slow to calculate. The above transducer will minimize in a couple of seconds, but adding more states will considerably slow down the algorithm. Epsilon cycles with negative weights are relatively rare, but can e.g. occur when creating rules and taking projection on the false side.

N best paths

HfstTransducer.n_best(self, n) does not work if the transducer has loops with negative weights.

Equivalence

HfstTransducer.compare(self, another) works poorly with weighted transducers due to precision issues.