Safaa is a Python package designed for handling false positive detection in copyright notices. Additionally, it can declutter copyright notices, removing unnecessary extra text.
- Load pre-trained models or train your own.
- Integration with scikit-learn for training and prediction.
- Integrated with spaCy for named entity recognition and decluttering tasks.
- Preprocessing tools to ensure data consistency and quality.
- Ability to handle local or default model directories.
To install Safaa, simply use pip:
pip install safaa
from safaa.Safaa import *
agent = SafaaAgent()
data = ["Your raw data here"]
preprocessed_data = agent.preprocess_data(data)
predictions = agent.predict(data)
decluttered_data = agent.declutter(data, predictions)
To train the false positive detector:
training_data = ["Your training data here"]
labels = ["Your labels here"]
agent.train_false_positive_detector_model(training_data, labels)
To train the named entity recognition model:
train_path = "path/to/train.spacy"
dev_path = "path/to/dev.spacy"
agent.train_ner_model(train_path, dev_path)
save_path = "path/to/save"
agent.save(save_path)
- scikit-learn
- spaCy
- joblib
- regex
- os
- shutil
This project is licensed under the GNU LESSER GENERAL PUBLIC LICENSE, Version 2.1, February 1999.
- Name: Abdelrahman Jamal
- Email: [email protected]
- LinkedIn: linkedin.com/in/abdelrahmanjamal