Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keybert extractor labels all extracted keywords with the same metadata key. #145

Open
pedrocassalpacheco opened this issue Feb 19, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@pedrocassalpacheco
Copy link
Collaborator

One of the interesting aspects of keyword extraction with Bert-based models is that you can extract many different keyword "types" on a single execution. The previous implementation allowed for passing a collection of terms to the model and receiving, in return, a collection of terms for each label. The current implementation hardcodes the metadata label and only passes a single string to the model for extraction.

For example, if I create metadata for a historical document, I want to know about people, places, dates, and events. And I would prefer to receive metadata looking like {"people": ["Alexander the Great," "Philip of Macedonia"...], "places": ["ancient Greece," "middle east," "Macedonia"...

I am happy to make the change if we agree that this is a good change in line with the previous implementation.

@kerinin @erichare

@pedrocassalpacheco pedrocassalpacheco added the enhancement New feature or request label Feb 19, 2025
@epinzur
Copy link
Collaborator

epinzur commented Feb 20, 2025

I wasn't aware that KeyBERT can function in this way. I thought it only extracts keywords and doesn't have a way to pull a set of labels like our SpaCy and GliNER transformers function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants