Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support pickle and copy.deepcopy for Sentences #3243

Open
dobbersc opened this issue May 20, 2023 · 0 comments · May be fixed by #3245
Open

[Feature]: Support pickle and copy.deepcopy for Sentences #3243

dobbersc opened this issue May 20, 2023 · 0 comments · May be fixed by #3245
Labels
feature A new feature

Comments

@dobbersc
Copy link
Collaborator

dobbersc commented May 20, 2023

Problem statement

Serializing and deserializing sentences with pickle is a common practice when using Flair in multiprocessing systems, e.g. with the Ray library. Similarly, creating deep copies of objects is a usually natural feature. Unfortunately, loading a pickled sentence with pickle or creating a deep copy with copy.deepcopy fails for sentences with span or relation annotations.

Example:

import copy
import pickle

from flair.data import Sentence

sentence: Sentence = Sentence("Berlin is the capital of Germany.")

# Works: pickle / copy on unannotated sentence
pickled: Sentence = pickle.loads(pickle.dumps(sentence))
copied: Sentence = copy.deepcopy(sentence)
assert pickled.to_original_text() == copied.to_original_text() == "Berlin is the capital of Germany."

# Add span annotations
sentence[:1].add_label("ner", value="LOC")  # Berlin
sentence[5:6].add_label("ner", value="LOC")  # Germany

# Does not work: pickle / copy on span-annotated sentence
# TypeError: __new__() missing 1 required positional argument: 'tokens'
pickled: Sentence = pickle.loads(pickle.dumps(sentence))
copied: Sentence = copy.deepcopy(sentence)
assert (
    {span.text for span in pickled.get_spans("ner")}
    == {span.text for span in copied.get_spans("ner")}
    == {"Berlin", "Germany"}
)

Related to #3191.

Solution

It would be great if sentences with arbitrary annotations were pickable or at least serializable in any manner and copyable.

Additional Context

The reason for the TypeError is sentences annotated with spans or relations contain references to these, where their implementations use the __new__ method.

def __new__(self, tokens: List[Token]):

def __new__(self, first: Span, second: Span):

To fix this the pickle (and copy) module require implementing a __getnewargs__ or __getnewargs_ex__ method that provides the arguments for __new__ when unpickling an object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant