Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The bokmål mode uses many non-bokmal words #1

Open
brackendawson opened this issue Jan 28, 2022 · 4 comments
Open

The bokmål mode uses many non-bokmal words #1

brackendawson opened this issue Jan 28, 2022 · 4 comments

Comments

@brackendawson
Copy link

I love the game, I too have been using it daily to improve my vocab. However after selecting bokmål it sometimes chooses words from dialects, such as game 4081 (oiete), which isn't even in the linked dictionary.

Definitely it's useful to learn dialects but they are above my level for now. Wondering if bokmål mode is using as good a word list as it could?

@evancharlton
Copy link
Owner

Hei!

Thanks for the report -- this is definitely an ongoing concern :) The word list comes from Nasjonalbibliotekets ordbank, which contains every possible word -- including ones which see little (or no) actual usage. They are, however, technically correct, hehe.

Personally, I see this as a good thing, as I discover words that are new to me -- and sometimes to my Norwegian friends, too! It makes me think about the construct of the language and how the word is valid, what its roots are, etc.

I wouldn't be opposed to adding a "bokmål (simplified)" dictionary that was limited to, say, the top 1,000 words or something. However, I haven't found such a list (note: we need a sufficiently-large corpus after the 5-letter restriction is applied). If you're able to find one, please let me know!


Regarding #4081 (OIETE) - it actually is a word, and can be found referenced in UIB's dictionary. Specifically, it appears to be a slang interjection of sorts. Perhaps I should switch dictionaries, instead 😉

@HaraldKorneliussen
Copy link

I think that the clue to wordle's success is that it has a manually curated list of possible answers. This is clearly a deliberate choice, since the list of words the interface will accept is much larger. Otherwise you get a lot of "scrabble words", the sort that are used more in word games like this than in their original contexts. Learning new words is well and good, but if your clue is

xOUNS

you'd feel a little cheated if the answer was louns, bouns or touns rather than nouns.

@brackendawson
Copy link
Author

Wordle does have a smaller dictionary of well known words that can be the target, it was hand crafted:

So the word list is another one of those things that I think I put a fair amount of effort into—actually my partner and I, we collaborated on it. Like I said, the first time I made the game, it just used every five-letter word. And I think it wasn’t very fun because—try and think about it—if the first time you play Wordle, the answer is a word you’d never heard of, I think you would feel cheated. And so we put a fair amount of effort into filtering. There are around, I don’t know, 13,000 five-letter words. And we put a fair amount of effort into filtering those down into a subset of around 2,500 solution words that can be the solution any day. And the way we did that actually was I built another game before this one, which took all 13,000 five-letter words and displayed a word and displayed three buttons: “I know this word,” “I don’t know this word,” and “I maybe know this word.” And my partner, she just wanted a mindless game. She was going through some tough times. She just wanted something she could sit down and mindlessly do. So she categorized all 13,000 words.

We'd need a very bored native speaker to do this, which is a problem. I was thinking in the shower 🚿 what a good way to do this might be, we could use Google Books Ngram to score each word based on its use in literature but they don't have a Norwegian dataset.. 😕

@evancharlton
Copy link
Owner

Thanks for the lively debate, everyone! That's one part of the problem, @brackendawson -- we don't have a good dataset. However, neither did Wordle. It would be [relatively] straightforward for someone to round up a few native-Norwegian speakers and ask them to go through the list and get it down to a set of "real" words.

If someone does such a thing, please let me know and I will happily add a "bokmål (simplified)" (eg) option to the language picker in the app!

This brings me back to my personal viewpoint: I like the uncommon/rare/obscure words. I built this game not to achieve mass-market popularity (exhibit A: I didn't buy ordle.no even though it was available when I launched this app). Instead, I built it to discover new (to me!) Norwegian words.

Every time I get presented with a word that isn't in the dictionary, I like the process of searching it up on Google to find what it could mean, where it came from, and so forth. As long as I learn something, it's a success to me 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants