-
Notifications
You must be signed in to change notification settings - Fork 1.1k
license of punkt in nltk_data #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@janstrunk do you have any advice please? |
I have the same question. Some other models clearly mention their licenses. Since it's not explicitly mentioned there, I'd assume it's not open-sourced. |
@ekaf I saw you've closed #236. I would like to let you know that I also have a #239. Could you please clarify what is the main issues with those two components: punkt in nltk_data ? |
Hi @ykirpichev, let's see how it goes, now that you have submitted your PR. |
It seems very likely that commercial products already exist, that rely on NLTK's Punkt tokenizer,. FWIW, please consider the following snippet generated by Gemini:
The mentioned security patch Security Bulletin: Vulnerability in Natural Language Toolkit (NLTK)( CVE-2024-39705) affects IBM watsonx Assistant for IBM Cloud Pak for Data does not mention Punkt explicitly, though. But it shows that they were using at least one of the five NLTK data packages that was affected by the vulnerability, and among those Punkt may seem the most likely. So, as long as the licensing terms of the Punkt data are unclear, it might be worthwhile to look at how its use is acknowledged in commercial products, which have presumably been reviewed by lawyers. |
Is it possible to use punkt in nltk_data for commercial use freely?
What is the license of punkt in nltk_data?
Thank you.
The text was updated successfully, but these errors were encountered: