You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
US patents before 1920 (format 1) listed each patentee twice, typically both as part of the header and the body of the text. We annotated both, we also collect both.
The side effect is that, for these formats, some metrics (e.g. size of the team) are deeply affected and could be misleading.
More
This can happen in 2 cases:
US format 1 ~ 100%
GB format 1 ~10% because of the provisional and final publication being both on the same document
Details
Version: 1.0.0rc1
The text was updated successfully, but these errors were encountered:
Deduplicate using relative Levenshtein distance on the name_text (iterat over all patentee couples).
US format 1: 97%+ accuracy with threshold .43 (see doc)
GB format 1: nothing yet
We created a new field is_duplicate which is True if we found a duplicate. Note that only 1 of the 2, the one bearing the less information, is marked duplicate.
Will be available as of v1.0.0rc2.
Leaving open in case we want to do something similar for GB
Issue description
US patents before 1920 (format 1) listed each patentee twice, typically both as part of the header and the body of the text. We annotated both, we also collect both.
The side effect is that, for these formats, some metrics (e.g. size of the team) are deeply affected and could be misleading.
More
This can happen in 2 cases:
Details
Version:
1.0.0rc1
The text was updated successfully, but these errors were encountered: