-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hfst-ospell leaks memory #27
Comments
The typos file used as input can be found here |
For some reason I can't get Valgrind to report memory leaks usefully for ospell, but those leaks are overall small in size. I think this isn't exactly a leak, but an effect of the cache slowly building up. The error model + lexicon are rich enough in this case that on average a cache entry is tens of megabytes, and as the correction encounters new first symbols, the cache gets bigger and bigger. Eg. you should see different behaviour if you first sort the typos vs. if you only have one type per first character. Perhaps the system should have some sort of cache priority order, bumping off older cache entries once the cache grows beyond some size? |
Yeah, if I grep for only words starting with On regular input, memory usage flattens out as input grows, which is also consistent with the only "leak" being the cache on first symbols. I tried e.g. clearing some random entries (whether .empty or not) on every correction, which gives slower memory growth at the cost of some speed loss (50% slower on 10k words when removing 3 random). Adding a use counter and strictly capping the amount of cached entries to 10 first symbols with plain LRU, the runtime is 118s vs 50s, with 486M max RES vs 712M max RES on a run of 10k words. With zero cache, we still use 414M on 10k words, so compared to that, upping it to 10 first symbols doesn't seem too bad. (I also tried LRU of 2 random entries, but there was basically no speed difference, probably because the list of cached entries is so short.) However, I'm pretty sure |
It seems that hfst-ospell is leaking memory. When run against a list of 12.5k typos, it starts out at about 130 Mb usage, and end up at almost 800 Mb before it is done. The memory use increases slowly throughout the processing. See attached screen shots of the memory consumption at different intervals. The zhfst file used is attached (rename *.zip to *.zhfst).
Tested against latest hfst-ospell code on macOS Sierra 10.12.1.
se.zip
The text was updated successfully, but these errors were encountered: