OOV Word Forms #90

vladob54 · 2019-01-20T09:51:11Z

I could quite appreciate if udpipe indicated somehow that a respective word form was not present in the morphological lexicon, i.e., its lemma, PoS and features have been guessed, This type of information is provided, e.g,, by TreeTagger and we make use of it while post-processing the tagger output, and also provide it to corpus users so that they can incorporate the respective attribute into their CQL queries...

Best,
Vlado B, 10:45

http://unesco.uniba.sk/guest/

arademaker · 2019-01-20T12:33:32Z

Indeed , nice suggestion

foxik · 2019-01-20T12:42:23Z

Currently it is not straightforward to implement this, because current UDPipe does not distinguish "real" morphological lexicon and guesser rules derived from the training data. (Our MorphoDiTa tool can do it, there we keep this distinction.)

BTW, if you have a morphological dictionary, you can perform the required operation manually after running UDPipe.

Also, the future UDPipe 2.0 will allow explicitly passing morphological dictionary (during inference, not just during training), so it will then be possible to indicate which words were processed just by a "guesser".

Leaving the issue open as a reminder.

ftyers · 2019-03-26T16:39:47Z

This is relevant to #50 too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOV Word Forms #90

OOV Word Forms #90

vladob54 commented Jan 20, 2019 •

edited

Loading

arademaker commented Jan 20, 2019

foxik commented Jan 20, 2019

ftyers commented Mar 26, 2019

OOV Word Forms #90

OOV Word Forms #90

Comments

vladob54 commented Jan 20, 2019 • edited Loading

arademaker commented Jan 20, 2019

foxik commented Jan 20, 2019

ftyers commented Mar 26, 2019

vladob54 commented Jan 20, 2019 •

edited

Loading