Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShareMemoryDataset does not have exists() method #3703

Open
noklam opened this issue Mar 13, 2024 · 4 comments · May be fixed by #4121
Open

ShareMemoryDataset does not have exists() method #3703

noklam opened this issue Mar 13, 2024 · 4 comments · May be fixed by #4121

Comments

@noklam
Copy link
Contributor

noklam commented Mar 13, 2024

Description

image

Context

Unclear, the SharedMemoryDataset is implemented by the core team and generating warning when run with ParallelRunner, is this false alarm or something we need to address? Cc @merelcht

Steps to Reproduce

I was testing something else and see the warning with this command:
pytest tests/framework/session/test_session_extension_hooks.py::TestNodeHooks::test_on_node_error_hook_parallel_runner

Possible Solution:

https://github.com/kedro-org/kedro/blob/0fc8089b637a0679f71e2bddc91f0676fc2914a2/kedro/io/memory_dataset.py#L74C1-L75C40
We have a simple check in MemoryDataset, we could implement simliar logic for ShareMemoryDataset.

Check whether it makes sense to have an _exists() method for SharedMemoryDataset. If not: update the logging level to be of DEBUG.

@merelcht
Copy link
Member

The exists() method is not an abstract method and so doesn't have to be implemented. The original SharedMemoryDataset was wrapped around MemoryDataset directly so I guess that's why the warning wasn't triggered before https://github.com/kedro-org/kedro/pull/3332/files#diff-cc4e916905ba553d257700ff851973772c0d77018d7673a2c4b22f309d8c3e46. IMO we can just ignore this.

@noklam
Copy link
Contributor Author

noklam commented Mar 28, 2024

@merelcht Haven't look into this in details, would the alternative be changing that WARNING message to INFO instead?

@merelcht
Copy link
Member

merelcht commented Aug 12, 2024

Reported again recently: #4060

Updated the scope to double check where exists is called and if it makes sense to implement this method on SharedMemoryDataset.

Also related: #4078

@felixscherz
Copy link

felixscherz commented Aug 24, 2024

Hi, I investigated this for a bit and I believe the issue is that SharedMemoryDataset delegates most methods to the underlying MemoryDataset via __getattr__.

def __getattr__(self, name: str) -> Any:
# This if condition prevents recursive call when deserialising
if name == "__setstate__":
raise AttributeError()
return getattr(self.shared_memory_dataset, name) # pragma: no cover

However, __getattr__ is only called for attributes that do not exist on the object and since SharedMemoryDataset inherits AbstractDataset._exists __getattr__ is never called.

We could implement _exists on SharedMemoryDataset, we would have to handle the case where self.shared_memory_dataset is None (return False in this case?):

class SharedMemoryDataset(AbstractDataset):
    ...
    def _exists(self) -> bool:
        if not self.shared_memory_dataset:
            return False
        return self.shared_memory_dataset.exists()

I'm happy to submit a PR implementing this:)

@felixscherz felixscherz linked a pull request Aug 27, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

3 participants