Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hfst-ol.jar produces spurious tags ; disagrees with hfst-lookup #3

Open
hfst-importer opened this issue Aug 22, 2013 · 2 comments
Open

Comments

@hfst-importer
Copy link

Problem: hfst-ol.jar produces spurious [POSS=3] and [CLIT=HAN] tags which are not seen in hfst-lookup for that exact same input and transducer. The output of hfst-lookup is correct, the output of hfst-ol.jar is incorrect.

Program version:
hsfst-ol.jar -> latest available for download
hfst-lookup -> hfst-lookup 0.6 (hfst 3.4.5)

Transducer:

Transducer file from Flammie, transformed from .hfst to .hfstol using hfst-fst2fst -w. I can attach it if needed.

Example:

mäntyä gets an extra [POSS=3] and koiransa gets an extra [CLIT=HAN].

$ java -jar hfst-ol.jar ../MODEL/morphology.omor.hfstol 
Reading header...
Reading alphabet...
Reading transition and index tables...
Ready for input.
mäntyä
mäntyä  [WORD_ID=mänty][POS=NOUN][NUM=SG][CASE=PAR][POSS=3]     313.0

koiransa
koiransa        [WORD_ID=koira][POS=NOUN][NUM=SG][CASE=NOM][POSS=3][CLIT=HAN]   313.0
koiransa        [WORD_ID=koira][POS=NOUN][NUM=SG][CASE=GEN][POSS=3][CLIT=HAN]   313.0
koiransa        [WORD_ID=koira][POS=NOUN][NUM=PL][CASE=NOM][POSS=3][CLIT=HAN]   313.0
```~~

Compare to hfst-lookup output of the same:

```~~
$ hfst-lookup ../MODEL/morphology.omor.hfstol
mäntyä
mäntyä  [WORD_ID=mänty][POS=NOUN][NUM=SG][CASE=PAR]     313.000000

koiransa
koiransa        [WORD_ID=koira][POS=NOUN][NUM=PL][CASE=NOM][POSS=3]     313.000000
koiransa        [WORD_ID=koira][POS=NOUN][NUM=SG][CASE=GEN][POSS=3]     313.000000
koiransa        [WORD_ID=koira][POS=NOUN][NUM=SG][CASE=NOM][POSS=3]     313.000000
```~~

Reported by: fginter
@hfst-importer
Copy link
Author

There is a bug in getAnalyses() where the integer symbol version of the analysis is transformed to a string with uninitialized data. That is, the integer terminator character is written only after transforming the analysis, not before it.

The attached patch fixes the problem. As an aside, it also fixes a problem where weights were in some situations calculated incorrectly, leading to huge result weights.

Original comment by: jiemakel

@hfst-importer
Copy link
Author

Thank for your contribution! I'm applying the patch.

Original comment by: Traubert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants