Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOV Word Forms #90

Open
vladob54 opened this issue Jan 20, 2019 · 3 comments
Open

OOV Word Forms #90

vladob54 opened this issue Jan 20, 2019 · 3 comments

Comments

@vladob54
Copy link

vladob54 commented Jan 20, 2019

I could quite appreciate if udpipe indicated somehow that a respective word form was not present in the morphological lexicon, i.e., its lemma, PoS and features have been guessed, This type of information is provided, e.g,, by TreeTagger and we make use of it while post-processing the tagger output, and also provide it to corpus users so that they can incorporate the respective attribute into their CQL queries...

Best,
Vlado B, 10:45

http://unesco.uniba.sk/guest/

@arademaker
Copy link

Indeed , nice suggestion

@foxik
Copy link
Member

foxik commented Jan 20, 2019

Currently it is not straightforward to implement this, because current UDPipe does not distinguish "real" morphological lexicon and guesser rules derived from the training data. (Our MorphoDiTa tool can do it, there we keep this distinction.)

BTW, if you have a morphological dictionary, you can perform the required operation manually after running UDPipe.

Also, the future UDPipe 2.0 will allow explicitly passing morphological dictionary (during inference, not just during training), so it will then be possible to indicate which words were processed just by a "guesser".

Leaving the issue open as a reminder.

@ftyers
Copy link

ftyers commented Mar 26, 2019

This is relevant to #50 too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants