Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use temporary working directory for output #152

Closed
wants to merge 15 commits into from

Conversation

felixhekhorn
Copy link
Contributor

@felixhekhorn felixhekhorn commented Oct 11, 2022

Closes #139

actually I feel this implementation in conceptually easier ...

we're running in a naming problem again ;-) since the new scheme will be: with eko.open("bla.tar") as eko_: both eko and EKO are taken at this point, but that is the least of our problems ...

todo:

  • fix deepcopy
  • fix tests

@felixhekhorn felixhekhorn added enhancement New feature or request output Output format and management labels Oct 11, 2022
@felixhekhorn felixhekhorn self-assigned this Oct 11, 2022
@codecov-commenter
Copy link

codecov-commenter commented Oct 23, 2022

Codecov Report

Merging #152 (7dae79a) into develop (2f2755c) will decrease coverage by 0.86%.
The diff coverage is 94.66%.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #152      +/-   ##
===========================================
- Coverage   100.00%   99.13%   -0.87%     
===========================================
  Files           97       97              
  Lines         4514     4524      +10     
===========================================
- Hits          4514     4485      -29     
- Misses           0       39      +39     
Flag Coverage Δ
isobench 54.12% <47.29%> (-0.25%) ⬇️
unittests 99.13% <94.66%> (-0.87%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/eko/output/struct.py 98.75% <94.28%> (-1.25%) ⬇️
src/eko/__init__.py 100.00% <100.00%> (ø)
src/eko/output/legacy.py 79.67% <0.00%> (-20.33%) ⬇️
src/eko/output/manipulate.py 87.95% <0.00%> (-12.05%) ⬇️

@pytest.fixture
def default_cards():
t = tc.generate(0, 1.0)
o = oc.generate([10.0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use keyword arguments for all of them, it is not intuitive what is what.

Copy link
Member

@alecandido alecandido Oct 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even more: maybe we should make them only keyword, such that you can't use this way (prepending a * as first argument in the definition)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the purpose of this PR it's fine - I just need whatever. However, tests can be improved in a later PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests can always be improved in a later PR, but we need better code and better tests, so we should stop postponing.
Let's not merge PR that deteriorate the quality of our tests, but, if possible, improve it

@alecandido
Copy link
Member

At the moment I don't have time to go through this, but I hope to have as soon as I will have finished my current task.

However, since everything else is moving on, this PR is becoming more an obstacle than a first step, so I would also reconsider dismissing completely and restart from the present status. If I'll implement myself, I believe I'll do it.

@felixhekhorn felixhekhorn linked an issue Nov 21, 2022 that may be closed by this pull request
5 tasks
@felixhekhorn
Copy link
Contributor Author

At the moment I don't have time to go through this, but I hope to have as soon as I will have finished my current task.

However, since everything else is moving on, this PR is becoming more an obstacle than a first step, so I would also reconsider dismissing completely and restart from the present status. If I'll implement myself, I believe I'll do it.

No. There is nothing wrong with the code in this PR - it is exactly the agreed strategy. It would be stupid to trash my time, effort and energy (we are already short of man power).

the remaining points from the issue #139

encode q2 by bytes in filenames
add CLI for conversion (almost a wrapper of output.legacy.load_tar()) in ekobox

can be addressed in a separate PR and none of them a crucial.

@felixhekhorn felixhekhorn marked this pull request as ready for review November 21, 2022 16:19
@alecandido
Copy link
Member

alecandido commented Nov 21, 2022

No. There is nothing wrong with the code in this PR - it is exactly the agreed strategy. It would be stupid to trash my time, effort and energy (we are already short of man power).

Never said there is something wrong: I was arguing that there is not much, and it didn't take much time in the first place.
Your time, effort, and energy are relevant as everyone's else is.

At the moment, we need this, but we need #162 more (that's why someone else is already in charge of it, instead of completing this first).

If the time to retain this and reconcile with #162 is estimated to be more than writing from scratch, then it is smart (not stupid) to apply the more efficient path.
We are not perfect, and we have limited resources. So we are not even perfect in planning, and assuming we will never revert anything not to waste "efforts" is dangerous for the project itself.

P.S.: this PR in the first place is reverting something I implemented myself, if you ask @andreab1997 you will find out it is not that trivial to work with tar files at every step; but it was not a good idea, and I acknowledge it, so reverting is good, and I'm not complaining for wasted effort

@alecandido
Copy link
Member

the remaining points from the issue #139

encode q2 by bytes in filenames
add CLI for conversion (almost a wrapper of output.legacy.load_tar()) in ekobox

can be addressed in a separate PR and none of them a crucial.

If you are rushing to merge before #162 because of possible conflicts, take into account that you might waste someone else's time.
Better to complete the content #139, at least the core part. I agree the CLI not to be crucial, we can postpone, but the q2 encoding was a relevant part, and quite a cheap one.

@andreab1997
Copy link
Contributor

No. There is nothing wrong with the code in this PR - it is exactly the agreed strategy. It would be stupid to trash my time, effort and energy (we are already short of man power).

Never said there is something wrong: I was arguing that there is not much, and it didn't take much time in the first place. Your time, effort, and energy are relevant as everyone's else is.

At the moment, we need this, but we need #162 more (that's why someone else is already in charge of it, instead of completing this first).

If the time to retain this and reconcile with #162 is estimated to be more than writing from scratch, then it is smart (not stupid) to apply the more efficient path. We are not perfect, and we have limited resources. So we are not even perfect in planning, and assuming we will never revert anything not to waste "efforts" is dangerous for the project itself.

P.S.: this PR in the first place is reverting something I implemented myself, if you ask @andreab1997 you will find out it is not that trivial to work with tar files at every step; but it was not a good idea, and I acknowledge it, so reverting is good, and I'm not complaining for wasted effort

Sorry to step in. Just to say that this PR is probably needed to address the first two points of the list I posted in #162. The other two points I can do but the last days I was working on MHOU and I did not have time to dedicate (sorry for this). So it is maybe true that we need #162 more that this PR but we need this PR in order to complete #162 (the point is that without this it is basically impossible to update the metadata file without creating another tar file each time).

Copy link
Member

@alecandido alecandido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch of comments


Please refer to our documentation for a full overview of the possibilities.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we discussed it: also here a new line would be required by the reference

__version__ = version.__version__

# export public methods
open = output.struct.EKO.open # pylint: disable=redefined-builtin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not redefine built-in, much better to export EKO top-level, and use:

import EKO from eko

with EKO.open() as e:
    ...

that is not much different from

import eko

with eko.open() as e:
    ...

And no one will be tempted to import open from eko, overwriting a built-in function.
(in general, better not to silence Pylint's warnings as much as possible)

__version__ = version.__version__

# export public methods
open = output.struct.EKO.open # pylint: disable=redefined-builtin
create = output.struct.EKO.create
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove also this, EKO.create is perfectly fine, short enough, and explicit, that always help the code.

output : dict
output dictionary - see :doc:`/code/IO`
output.EKO
output object - see :doc:`/code/IO`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output object is too generic, you can also leave it empty.

Please replace with something like "computed operator" - same length, but more explicit. If possible expand a bit.

output : dict
output dictionary - see :doc:`/code/IO`
output.EKO
output object - see :doc:`/code/IO`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a blank line.

@@ -329,6 +335,8 @@ class EKO:
"""Path on disk, to which this object is linked (and for which it is
essentially an interface).
"""
working_dir: pathlib.Path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use path: there are not two distinct objects

  • if the EKO is opened, then path /wdir is the working dir, and not the tar, since it is not associated to it (it might be loaded from a tar, as a NumPy array might be loaded from an .npy file, but after loading any association is lost)
  • if the EKO is closed, it should not be used (on close the object should be destroyed), so again, you can dump it on a tar

So, I would decouple the two operations:

  • close() will just clean up the folder
  • dump(path) will save another tar (or write_tar(path), here I'm not complaining about the name)

If you open with the context manager, then is the context manager that is in charge of using the same path, and it is retained in the function scope, so you don't need a class attribute for it.

yield obj
finally:
obj.write_tar()
shutil.rmtree(obj.working_dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to a separate close() function, such that is available also when the context manager is not used.

At the moment, it only support text files (since it is returning the
content as a string)
@classmethod
def open_tar(cls, path: os.PathLike):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since tar is the only file format supported, I propose renaming the function:

Suggested change
def open_tar(cls, path: os.PathLike):
def load(cls, path: os.PathLike):

logger.info(f"Operator loaded from path '{path}' into '{target}'")
return eko

def write_tar(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, _tar is redundant:

Suggested change
def write_tar(self):
def dump(self):

@@ -360,8 +368,8 @@ def xgrid(self, value: interpolation.XGrid):

def __post_init__(self):
"""Validate class members."""
if self.path.suffix != ".tar":
raise ValueError("Not a valid path for an EKO")
if self.path is not None and self.path.suffix != ".tar":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this check to the open_tar()/load() function.
This is not related any longer to the object itself, and a dataclass without a __post_init__ is closer to the idea of a struct (even though was just a check, but without is even better)

@alecandido
Copy link
Member

alecandido commented Nov 21, 2022

(the point is that without this it is basically impossible to update the metadata file without creating another tar file each time).

@andreab1997 it is the exact same of the current strategy to set an operator. If you need to create another tar file do it, as agreed (in any case, metadata is seldom updated, and of course it will improve with this, eventually).

Entangling PRs is not a good idea: if we really have strong requirements, we will wait. But if they are not strong, let's not do it.
This PR has been here since one month and half, and no one was working on it for 29 days. If it could have been done in half a day, it should have happened before.

@andreab1997 andreab1997 mentioned this pull request Nov 29, 2022
@alecandido alecandido mentioned this pull request Dec 5, 2022
20 tasks
@felixhekhorn
Copy link
Contributor Author

Close in favour of #172

@felixhekhorn felixhekhorn deleted the feature/output-temp-dir branch January 5, 2023 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request output Output format and management
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Work in a Temporary Folder
4 participants