-
-
Notifications
You must be signed in to change notification settings - Fork 613
Improve score by supporting extra_phrase for extra words in rules
#4432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Improve score by supporting extra_phrase for extra words in rules
#4432
Conversation
6b7da16 to
8fb9f47
Compare
ac99199 to
9b50bff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alok1304! Looking much better
See comments for your consideration. I've updated your PR description to mention that this is a follow up PR, since there is important context and reviews in the previous PR, we need to preserve this as required.
b9c7c16 to
a50b3db
Compare
|
I addet test for |
8a25b51 to
43c6bdb
Compare
…_log` Add test for is correct position of `extra-words` according to `extra-phrases` that is present in rules. if we find `extra-words` are in the right place then we set score to `100`. And also show in `detection_log` why we increasing the score to keep track of this. Signed-off-by: Alok Kumar <[email protected]>
Add new phrases like `extra_phrase` this is special for extra-words. This phrase is represented in the format [[n]], where n indicates the maximum number of extra-words allowed at that position in the rule. If extra-words appear at the correct position and their count does not exceed the allowed limit `n`, then the score is increased to `100`. Signed-off-by: Alok Kumar <[email protected]>
Signed-off-by: Alok Kumar <[email protected]>
due to `extra_phrase` in rules, this shows that rules containing `extra-words` Signed-off-by: Alok Kumar <[email protected]>
Signed-off-by: Alok Kumar <[email protected]>
Signed-off-by: Alok Kumar <[email protected]>
add a new `extra-phrase` for a rule i.e bsd-new Signed-off-by: Alok Kumar <[email protected]>
Signed-off-by: Alok Kumar <[email protected]>
Signed-off-by: Alok Kumar <[email protected]>
b38f718 to
61284f4
Compare
Signed-off-by: Alok Kumar <[email protected]>
…rds` Signed-off-by: Alok Kumar <[email protected]>
Signed-off-by: Alok Kumar <[email protected]>
|
Actually, the previous test case is failing because of because Now, all the test cases are passed :) |
Signed-off-by: Alok Kumar <[email protected]>
…tection Signed-off-by: Alok Kumar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alok1304. looking great overall, see comments for your consideration.
...dcode/data/plugin_license/extra-words/scan-extra-words-3-seq-license/with-copyrights/LICENSE
Show resolved
Hide resolved
tests/licensedcode/data/plugin_license/extra-words/scan-extra-words-3-seq-license.expected.json
Show resolved
Hide resolved
tests/licensedcode/data/plugin_license/extra-words/scan-extra-words-2-aho-license.expected.json
Show resolved
Hide resolved
Signed-off-by: Alok Kumar <[email protected]>
Signed-off-by: Alok Kumar <[email protected]>
Signed-off-by: Alok Kumar <[email protected]>
Signed-off-by: Alok Kumar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alok1304, looking good!
Two minor changes requested for your consideration:
- in the test and test filenames we should add the new case as a new test and let the old tests (which we have perfect scores for now) stay as-is
- In the rule detection tests we should process the text before running license detection instead of ignoring rules with extra word markers.
Asking a review from @pombredanne too as this is ready for him to take a look at.
tests/licensedcode/data/plugin_license/extra-words/scan-extra-words-3-seq-license.expected.json
Outdated
Show resolved
Hide resolved
Signed-off-by: Alok Kumar <[email protected]>
| test_dir = test_env.get_test_loc('plugin_license/extra-words/scan-extra-words-3-seq-license-with-copyright/') | ||
| result_file = test_env.get_temp_file('json') | ||
| args = [ | ||
| '--license', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also use the --copyright option here so we have copyright detection in the resource, we have to use this to resolve the issue.
Signed-off-by: Alok Kumar <[email protected]>
|
Hey @AyanSinhaMahapatra, this is ready to merge. |
| # early from the loops: trying to check containment on wildly separated matches | ||
| # does not make sense | ||
|
|
||
| def is_extra_words_position_valid(match): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the scoring should not be more subtle. In practice, we can have multiple regions allowing for extra words. And some regions may have OK extra words, and some other may not have that, so this is may not be a binary True or False, with 100 or not 100 but instead a score computation that may be more nuanced?
Follow up of:
extra_phrasefor extra words in rules #4424Add new phrases like
extra_phrasethis is special forextra-words. This phrase is represented in the format[[n]], where n indicates the maximum number ofextra-wordsallowed at that position in the rule.If
extra-wordsappear at the correct position and their count does not exceed the allowed limitn, then the score is increased to100.Reference #4420
Tasks
Run tests locally to check for errors.
Signed-off-by: Alok Kumar [email protected]