-
Notifications
You must be signed in to change notification settings - Fork 0
HfstIterableTransducer
-
class HfstIterableTransducer
- longest_path_size (self)
- is_infinitely_ambiguous (self)
- is_lookup_infinitely_ambiguous (self, str)
- lookup (self, input, **kwargs)
- add_state (self)
- add_state (self, state)
- states (self)
- states_and_transitions (self)
- add_symbol_to_alphabet (self, symbol)
- add_symbols_to_alphabet (self, symbols)
- add_transition (self, state, transition, add_symbols_to_alphabet=True)
- add_transition (self, source, target, input, output, weight=0)
- remove_transition (self, s, transition, remove_symbols_from_alphabet=False)
- disjunct (self, stringpairpath, weight)
- get_alphabet (self)
- get_final_weight (self, state)
- get_max_state (self)
- harmonize (self, another)
- __init__ (self)
- __init__ (self, transducer)
- read_prolog(f, linecount)
- write_prolog (self, f, name, write_weights=True)
- write_xfst (self, f, write_weights=True)
- read_att(f, epsilon_symbol, linecount)
- write_att (self, f, bool write_weights=True)
- insert_freely (self, symbol_pair, weight)
- insert_freely (self, transducer)
- is_final_state (self, state)
- transitions (self, state)
- prune_alphabet (self)
- symbols_used (self)
- get_transition_pairs (self)
- remove_symbol_from_alphabet (self, symbol)
- remove_symbols_from_alphabet (self, symbols)
- set_final_weight (self, state, weight)
- remove_final_weight (self, state)
- sort_arcs (self)
- substitute (self, s, S=None, **kwargs)
- enumerate (self)
- str (self)
A simple transducer class with tropical weights.
An example of creating an HfstIterableTransducer [foo:bar baz:baz]
with weight 0.4 from scratch:
# Create an empty transducer
# The transducer has initially one start state (number zero)
# that is not final
fsm = hfst.HfstIterableTransducer()
# Add two states to the transducer
fsm.add_state(1)
fsm.add_state(2)
# Create a transition [foo:bar] leading to state 1 with weight 0.1
tr = hfst.HfstTransition(1, 'foo', 'bar', 0.1)
# and add it to state zero
fsm.add_transition(0, tr)
# Add a transition [baz:baz] with weight 0 from state 1 to state 2
fsm.add_transition(1, hfst.HfstTransition(2, 'baz', 'baz', 0.0))
# Set state 2 as final with weight 0.3
fsm.set_final_weight(2, 0.3)
An example of iterating through the states and transitions of the above transducer when printing them in AT&T format to standard output:
# Go through all states
for state, arcs in enumerate(fsm):
for arc in arcs:
print('%i ' % (state), end='')
print(arc)
if fsm.is_final_state(state):
print('%i %f' % (state, fsm.get_final_weight(state)) )
TODO: DOCUMENT:
- static HfstIterableTransducer intersect(HfstIterableTransducer & graph1, HfstIterableTransducer & graph2);
- HfstIterableTransducer &complete();
- std::vector<std::set > topsort(SortDistance dist) const;
- std::vector path_sizes();
- bool is_lookup_infinitely_ambiguous(const HfstOneLevelPath & s);
- bool is_lookup_infinitely_ambiguous(const StringVector & s);
- void insert_transducer(HfstState state1, HfstState state2, const HfstIterableTransducer & graph);
The length of the longest path in transducer.
Length of a path means number of arcs on that path.
Whether the transducer is infinitely ambiguous.
A transducer is infinitely ambiguous if there exists an input that will yield infinitely many results, i.e. there are input epsilon loops that are traversed with that input.
Whether the transducer is infinitely ambiguous with input str
.
-
str
The input.
A transducer is infinitely ambiguous with a given input if the input yields infinitely many results, i.e. there are input epsilon loops that are traversed with the input.
-
str
A list/tuple of strings to look up. -
kwargs
max_epsilon_loops=-1, max_weight=None, obey_flags=False -
max_epsilon_loops
How many times epsilon input loops are followed. Defaults to -1, i.e. infinitely. -
max_weight
What is the maximum weight of a result allowed. Defaults to None, i.e. infinity. -
obey_flags
Whether flag diacritic constraints are obeyed. Defaults to False.
Add a new state to this transducer and return its number.
The next (smallest) free state number.
Add a state s
to this graph.
-
state
The number of the state to be added.
The state state
.
If the state already exists, it is not added again.
All states with state number smaller than s
are also
added to the transducer if they did not exist before.
The states of the transducer
A tuple of state numbers.
for state in fsm.states():
for arc in fsm.transitions(state):
print('%i ' % (state), end='')
print(arc)
if fsm.is_final_state(state):
print('%i %f' % (state, fsm.get_final_weight(state)) )
The states and transitions of the transducer.
A tuple of tuples of HfstTransitions.
- hfst.HfstIterableTransducer.__enumerate__
Explicitly add symbol
to the alphabet of the graph.
Note: Usually the user does not have to take care of the alphabet of a graph. This function can be useful in some special cases.
Parameters:
-
symbol
The string to be added.
Explicitly add symbols
to the alphabet of the graph.
Note: Usually the user does not have to take care of the alphabet of a graph. This function can be useful in some special cases.
Parameters:
-
symbols
A tuple of strings to be added.
Add a transition transition
to state state,
add_symbols_to_alphabet
defines whether the transition symbols are added to the alphabet.
Parameters:
-
state
The number of the state where the transition is added. If it does not exist, it is created. -
transition
A hfst.HfstTransition that is added tostate
. -
add_symbols_to_alphabet
Whether the transition symbols are added to the alphabet of the transducer. (In special cases this is not wanted.)
Note: Adding transitions during iteration (e.g. with #transitions) will invalidate the iteration. Iteration of states (e.g. with #states) is possible.
See also: #remove_transition
Add a transition from state source
to state target
with input symbol input,
output symbol output
and weight weight
.
Parameters:
-
source
The number of the state where the transition is added. If it does not exist, it is created. -
target
The number of the state where the transition leads. If it does not exist, it is created. (?) -
input
The input symbol of the transition. -
output
The output symbol of the transition. -
weight
The weight of the transition.
Note: Adding transitions during iteration (e.g. with #transitions) will invalidate the iteration. Iteration of states (e.g. with #states) is possible.
See also: #remove_transition
Remove all transitions equivalent to transition
from state s
.
Parameters:
-
s
The state whichtransition
belongs to. -
transition
A transition which is compared with all transitions of statestate,
ignoring the weights. It a transition is equivalent totransition,
it is removed from the transducer. -
remove_symbols_from_alphabet
Remove such symbols from transducer alphabet that no longer occur in its transitions (as a result of transition removal).
Note: Removing transitions during iteration (e.g. with #transitions) will invalidate the iteration. Iteration of states (e.g. with #states) is possible.
See also: #add_transition
An example of allowing transition input and output symbols to be swapped with weight 0.5 and stay as they are with weight 0.3:
X = hfst.regex("a:A | b:B c:C")
B = hfst.HfstIterableTransducer(X)
print(B)
for state in B.states():
arcs_to_be_removed=[]
arcs_to_be_added=[]
for arc in B.transitions(state):
tostate = arc.get_target_state()
insym = arc.get_input_symbol()
outsym = arc.get_output_symbol()
arcs_to_be_removed.append(arc)
arcs_to_be_added.append(hfst.HfstTransition(tostate, insym, outsym, 0.3))
arcs_to_be_added.append(hfst.HfstTransition(tostate, outsym, insym, 0.5))
for arc in arcs_to_be_removed:
B.remove_transition(state, arc)
for arc in arcs_to_be_added:
B.add_transition(state, arc)
print(B)
Result:
0 1 b B 0
0 2 a A 0
1 2 c C 0
2 0
0 1 b B 0.3
0 1 B b 0.5
0 2 a A 0.3
0 2 A a 0.5
1 2 c C 0.3
1 2 C c 0.5
2 0
Disjunct this transducer with a one-path transducer defined by consecutive string pairs in spv
that has weight weight
.
Precondition: This graph must be a trie where all weights are in final states, i.e. all transitions have a zero weight.
Parameters:
-
stringpairpath
: The path to be added (a tuple of 2-tuples of strings) -
weight
: The weight of the path to be added
There is no way to test whether a graph is a trie, so the use of this function is probably limited to fast construction of a lexicon. Here is an example:
lexicon = hfst.HfstIterableTransducer()
tok = hfst.HfstTokenizer()
lexicon.disjunct(tok.tokenize('dog'), 0.3)
lexicon.disjunct(tok.tokenize('cat'), 0.5)
lexicon.disjunct(tok.tokenize('elephant'), 1.6)
The symbols in the alphabet of the transducer.
The symbols do not necessarily occur in any transitions of the transducer. Epsilon, unknown and identity symbols are always included in the alphabet.
Return: A tuple of strings.
Get the final weight of state state
in this transducer.
Parameters:
-
state
The number of the state. If it does not exist, a StateIsNotFinalException is thrown.
Throws:
Get the biggest state number in use.
Return: The biggest state number in use.
Harmonize this transducer and another
.
In harmonization the unknown and identity symbols in transitions of both graphs are expanded according to the symbols that are previously unknown to the graph.
For example the graphs
[a:b ?:?]
[c:d ? ?:c]
are expanded to
[ a:b [?:? | ?:c | ?:d | c:d | d:c | c:? | d:?] ]
[ c:d [? | a | b] [?:c| a:c | b:?] ]
when harmonized.
The symbol '?' means hfst.UNKNOWN in either or both sides of a transition (transitions of type [?:x], [x:?] and [?:?]). The transition [?] means hfst.IDENTITY.
Note: This function is always called for all transducer arguments of functions that take two or more graphs as their arguments, unless otherwise said.
Create a transducer with one initial state that has state number zero and is not a final state, i.e. create an empty transducer.
tr = hfst.HfstIterableTransducer()
Create a transducer equivalent to transducer
.
Parameters:
-
transducer
The transducer to be copied (an HfstIterableTransducer or HfstTransducer).
tr = hfst.regex('foo') # creates an HfstTransducer
TR = hfst.HfstIterableTransducer(tr)
TR2 = hfst.HfstIterableTransducer(TR)
Read a transducer from prolog file f
. linecount
is incremented as lines are read (is it in python?).
Parameters:
-
file
: A Python file where transducers are read from. -
linecount
: Is incremented as lines are read from filef
.
Return: A transducer constructed by reading from file file
.
This function is a static one.
Write the transducer in prolog format to file f.
Name the transducer name
.
Parameters:
-
f
: A python file where the transducer is written. -
name
: The name of the transducer to be written. -
write_weights
: Whether weights are written, defaults to True.
Write the transducer in xfst format to file f
.
Read a transducer in AT&T format from file f.
epsilon_symbol
defines the symbol used for epsilon, linecount
is incremented as lines are read.
Return: A transducer constructed by reading from file file
.
This function is a static one.
Write this transducer in AT&T format to file f,
write_weights
defines whether weights are written.
Insert freely any number of symbol_pair
in the transducer with weight weight
.
-
symbol_pair
A string pair to be inserted. -
weight
The weight of the inserted symbol pair.
Insert freely any number of transducer
in this transducer.
param transducer An HfstIterableTransducer to be inserted.
Whether state state
is final.
-
state
The state whose finality is returned.
Get the transitions of state state
in this transducer.
If the state does not exist, a hfst.exceptions.StateIndexOutOfBoundsException is thrown.
Return: A tuple of HfstTransitions.
for state in fsm.states():
for arc in fsm.transitions(state):
print('%i ' % (state), end='')
print(arc)
if fsm.is_final_state(state):
print('%i %f' % (state, fsm.get_final_weight(state)) )
Remove all symbols that do not occur in transitions of the transducer from its alphabet. Epsilon, unknown and identity symbols are always included in the alphabet.
Get a list of all symbols used in the transitions of this transducer.
Get a list of all input/output symbol pairs used in the transitions of this transducer.
Remove symbol symbol
from the alphabet of the graph.
Note: Use with care, removing symbols that occur in the transitions of the graph can have unexpected results.
-
symbol
The string to be removed.
Remove symbols symbols
from the alphabet of the graph.
Note: Use with care, removing symbols that occur in the transitions of the graph can have unexpected results.
-
symbols
A tuple of strings to be removed.
Set the final weight of state state
in this transducer to weight
.
If the state does not exist, it is created.
Remove final weight from state state,
i.e. make it a non-final state.
Sort the arcs of this transducer according to input and output symbols. Return: This transducer.
Substitute symbols or transitions in the transducer.
-
s
The symbol or transition to be substituted. Can also be a dictionary of substitutions, if S == None. -
S
The symbol, transition, a tuple of transitions or a transducer (hfst.HfstIterableTransducer) that substitutess
. -
kwargs
Arguments recognized are 'input' and 'output', their values can be False or True, True being the default. These arguments are valid only ifs
andS
are strings, else they are ignored. -
input
Whether substitution is performed on input side, defaults to True. Valid only ifs
andS
are strings. -
output
Whether substitution is performed on output side, defaults to True. Valid only ifs
andS
are strings.
Possible combinations of arguments and their types are:
(1) substitute(str, str, input=bool, output=bool): substitute symbol with symbol on input, output or both sides of each transition in the transducer.
(2) substitute(strpair, strpair): substitute transition with transition
(3) substitute(strpair, strpairtuple): substitute transition with several transitions
(4) substitute(strpair, transducer): substitute transition with a transducer
(5) substitute(dict): perform several symbol-to-symbol substitutions
(6) substitute(dict): perform several transition-to-transition substitutions
Examples:
(1) tr.substitute('a', 'A', input=True, output=False): substitute lowercase a:s with uppercase ones
(2) tr.substitute(('a','b'),('A','B')): substitute transitions that map lowercase a into lowercase b with transitions that map uppercase a into uppercase b
(3) tr.substitute(('a','b'), (('A','B'),('a','B'),('A','b'))): change either or both sides of a transition [a:b] to uppercase
(4) tr.substitute(('a','b'), hfst.regex('[a:b]+')) change [a:b] transition into one or more consecutive [a:b] transitions
(5) tr.substitute({'a':'A', 'b':'B', 'c':'C'}) change lowercase a, b and c into their uppercase variants
(6) tr.substitute( {('a','a'):('A','A'), ('b','b'):('B','B'), ('c','c'):('C','C')} ): change lowercase a, b and c into their uppercase variants
In case (4), epsilon transitions are used to attach copies of transducer S
between the SOURCE and TARGET state of each transition that is substituted.
The transition itself is deleted, but its weight is copied to the epsilon transition leading from SOURCE to the initial state of S
.
Each final state of S
is made non-final and an epsilon transition leading to TARGET is attached to it. The final weight is copied to the epsilon transition.
Return an enumeration of the states and transitions of the transducer.
for state, arcs in enumerate(fsm):
for arc in arcs:
print('%i ' % (state), end='')
print(arc)
if fsm.is_final_state(state):
print('%i %f' % (state, fsm.get_final_weight(state)) )
Return a string representation of the transducer.
print(fsm)
Package hfst
- AttReader
- PrologReader
- HfstIterableTransducer
- HfstTransition
- HfstTransducer
- HfstInputStream
- HfstOutputStream
- MultiCharSymbolTrie
- HfstTokenizer
- LexcCompiler
- XreCompiler
- PmatchContainer
- ImplementationType