Use temporary working directory for output #152

felixhekhorn · 2022-10-11T16:31:59Z

Closes #139

actually I feel this implementation in conceptually easier ...

we're running in a naming problem again ;-) since the new scheme will be: with eko.open("bla.tar") as eko_: both eko and EKO are taken at this point, but that is the least of our problems ...

todo:

fix deepcopy
fix tests

codecov-commenter · 2022-10-23T13:13:25Z

Codecov Report

Merging #152 (7dae79a) into develop (2f2755c) will decrease coverage by 0.86%.
The diff coverage is 94.66%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #152      +/-   ##
===========================================
- Coverage   100.00%   99.13%   -0.87%     
===========================================
  Files           97       97              
  Lines         4514     4524      +10     
===========================================
- Hits          4514     4485      -29     
- Misses           0       39      +39

Flag	Coverage Δ
isobench	`54.12% <47.29%> (-0.25%)`	⬇️
unittests	`99.13% <94.66%> (-0.87%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/eko/output/struct.py	`98.75% <94.28%> (-1.25%)`	⬇️
src/eko/__init__.py	`100.00% <100.00%> (ø)`
src/eko/output/legacy.py	`79.67% <0.00%> (-20.33%)`	⬇️
src/eko/output/manipulate.py	`87.95% <0.00%> (-12.05%)`	⬇️

src/eko/output/struct.py

alecandido · 2022-10-23T15:43:32Z

tests/conftest.py

+@pytest.fixture
+def default_cards():
+    t = tc.generate(0, 1.0)
+    o = oc.generate([10.0])


Use keyword arguments for all of them, it is not intuitive what is what.

Even more: maybe we should make them only keyword, such that you can't use this way (prepending a * as first argument in the definition)

for the purpose of this PR it's fine - I just need whatever. However, tests can be improved in a later PR.

Tests can always be improved in a later PR, but we need better code and better tests, so we should stop postponing.
Let's not merge PR that deteriorate the quality of our tests, but, if possible, improve it

alecandido · 2022-11-21T15:05:59Z

At the moment I don't have time to go through this, but I hope to have as soon as I will have finished my current task.

However, since everything else is moving on, this PR is becoming more an obstacle than a first step, so I would also reconsider dismissing completely and restart from the present status. If I'll implement myself, I believe I'll do it.

felixhekhorn · 2022-11-21T16:19:37Z

At the moment I don't have time to go through this, but I hope to have as soon as I will have finished my current task.

However, since everything else is moving on, this PR is becoming more an obstacle than a first step, so I would also reconsider dismissing completely and restart from the present status. If I'll implement myself, I believe I'll do it.

No. There is nothing wrong with the code in this PR - it is exactly the agreed strategy. It would be stupid to trash my time, effort and energy (we are already short of man power).

the remaining points from the issue #139

encode q2 by bytes in filenames
add CLI for conversion (almost a wrapper of output.legacy.load_tar()) in ekobox

can be addressed in a separate PR and none of them a crucial.

alecandido · 2022-11-21T16:25:44Z

No. There is nothing wrong with the code in this PR - it is exactly the agreed strategy. It would be stupid to trash my time, effort and energy (we are already short of man power).

Never said there is something wrong: I was arguing that there is not much, and it didn't take much time in the first place.
Your time, effort, and energy are relevant as everyone's else is.

At the moment, we need this, but we need #162 more (that's why someone else is already in charge of it, instead of completing this first).

If the time to retain this and reconcile with #162 is estimated to be more than writing from scratch, then it is smart (not stupid) to apply the more efficient path.
We are not perfect, and we have limited resources. So we are not even perfect in planning, and assuming we will never revert anything not to waste "efforts" is dangerous for the project itself.

P.S.: this PR in the first place is reverting something I implemented myself, if you ask @andreab1997 you will find out it is not that trivial to work with tar files at every step; but it was not a good idea, and I acknowledge it, so reverting is good, and I'm not complaining for wasted effort

alecandido · 2022-11-21T16:30:23Z

the remaining points from the issue #139

encode q2 by bytes in filenames
add CLI for conversion (almost a wrapper of output.legacy.load_tar()) in ekobox

can be addressed in a separate PR and none of them a crucial.

If you are rushing to merge before #162 because of possible conflicts, take into account that you might waste someone else's time.
Better to complete the content #139, at least the core part. I agree the CLI not to be crucial, we can postpone, but the q2 encoding was a relevant part, and quite a cheap one.

andreab1997 · 2022-11-21T16:34:55Z

No. There is nothing wrong with the code in this PR - it is exactly the agreed strategy. It would be stupid to trash my time, effort and energy (we are already short of man power).

Never said there is something wrong: I was arguing that there is not much, and it didn't take much time in the first place. Your time, effort, and energy are relevant as everyone's else is.

At the moment, we need this, but we need #162 more (that's why someone else is already in charge of it, instead of completing this first).

If the time to retain this and reconcile with #162 is estimated to be more than writing from scratch, then it is smart (not stupid) to apply the more efficient path. We are not perfect, and we have limited resources. So we are not even perfect in planning, and assuming we will never revert anything not to waste "efforts" is dangerous for the project itself.

P.S.: this PR in the first place is reverting something I implemented myself, if you ask @andreab1997 you will find out it is not that trivial to work with tar files at every step; but it was not a good idea, and I acknowledge it, so reverting is good, and I'm not complaining for wasted effort

Sorry to step in. Just to say that this PR is probably needed to address the first two points of the list I posted in #162. The other two points I can do but the last days I was working on MHOU and I did not have time to dedicate (sorry for this). So it is maybe true that we need #162 more that this PR but we need this PR in order to complete #162 (the point is that without this it is basically impossible to update the metadata file without creating another tar file each time).

alecandido

First batch of comments

alecandido · 2022-11-21T16:31:13Z

src/eko/__init__.py


+Please refer to our documentation for a full overview of the possibilities.
+"""


Since we discussed it: also here a new line would be required by the reference

alecandido · 2022-11-21T16:33:51Z

src/eko/__init__.py

 __version__ = version.__version__

+# export public methods
+open = output.struct.EKO.open  # pylint: disable=redefined-builtin


Do not redefine built-in, much better to export EKO top-level, and use:

import EKO from eko with EKO.open() as e: ...

that is not much different from

import eko with eko.open() as e: ...

And no one will be tempted to import open from eko, overwriting a built-in function.
(in general, better not to silence Pylint's warnings as much as possible)

alecandido · 2022-11-21T16:34:47Z

src/eko/__init__.py

 __version__ = version.__version__

+# export public methods
+open = output.struct.EKO.open  # pylint: disable=redefined-builtin
+create = output.struct.EKO.create


Remove also this, EKO.create is perfectly fine, short enough, and explicit, that always help the code.

alecandido · 2022-11-21T16:36:32Z

src/eko/__init__.py

-        output : dict
-            output dictionary - see :doc:`/code/IO`
+    output.EKO
+        output object - see :doc:`/code/IO`


output object is too generic, you can also leave it empty.

Please replace with something like "computed operator" - same length, but more explicit. If possible expand a bit.

alecandido · 2022-11-21T16:36:42Z

src/eko/__init__.py

-        output : dict
-            output dictionary - see :doc:`/code/IO`
+    output.EKO
+        output object - see :doc:`/code/IO`


Please add a blank line.

alecandido · 2022-11-21T16:51:23Z

src/eko/output/struct.py

@@ -329,6 +335,8 @@ class EKO:
    """Path on disk, to which this object is linked (and for which it is
    essentially an interface).
    """
+    working_dir: pathlib.Path


You can use path: there are not two distinct objects

if the EKO is opened, then path /wdir is the working dir, and not the tar, since it is not associated to it (it might be loaded from a tar, as a NumPy array might be loaded from an .npy file, but after loading any association is lost)

if the EKO is closed, it should not be used (on close the object should be destroyed), so again, you can dump it on a tar

So, I would decouple the two operations:

close() will just clean up the folder

dump(path) will save another tar (or write_tar(path), here I'm not complaining about the name)

If you open with the context manager, then is the context manager that is in charge of using the same path, and it is retained in the function scope, so you don't need a class attribute for it.

alecandido · 2022-11-21T16:53:06Z

src/eko/output/struct.py

+            yield obj
+        finally:
+            obj.write_tar()
+            shutil.rmtree(obj.working_dir)


Move this to a separate close() function, such that is available also when the context manager is not used.

alecandido · 2022-11-21T16:54:01Z

src/eko/output/struct.py

-        At the moment, it only support text files (since it is returning the
-        content as a string)
+    @classmethod
+    def open_tar(cls, path: os.PathLike):


Since tar is the only file format supported, I propose renaming the function:

Suggested change

def open_tar(cls, path: os.PathLike):

def load(cls, path: os.PathLike):

alecandido · 2022-11-21T16:54:31Z

src/eko/output/struct.py

+        logger.info(f"Operator loaded from path '{path}' into '{target}'")
+        return eko
+
+    def write_tar(self):


As above, _tar is redundant:

Suggested change

def write_tar(self):

def dump(self):

alecandido · 2022-11-21T16:55:23Z

src/eko/output/struct.py

@@ -360,8 +368,8 @@ def xgrid(self, value: interpolation.XGrid):

    def __post_init__(self):
        """Validate class members."""
-        if self.path.suffix != ".tar":
-            raise ValueError("Not a valid path for an EKO")
+        if self.path is not None and self.path.suffix != ".tar":


Move this check to the open_tar()/load() function.
This is not related any longer to the object itself, and a dataclass without a __post_init__ is closer to the idea of a struct (even though was just a check, but without is even better)

alecandido · 2022-11-21T16:58:23Z

(the point is that without this it is basically impossible to update the metadata file without creating another tar file each time).

@andreab1997 it is the exact same of the current strategy to set an operator. If you need to create another tar file do it, as agreed (in any case, metadata is seldom updated, and of course it will improve with this, eventually).

Entangling PRs is not a good idea: if we really have strong requirements, we will wait. But if they are not strong, let's not do it.
This PR has been here since one month and half, and no one was working on it for 29 days. If it could have been done in half a day, it should have happened before.

felixhekhorn · 2022-12-22T16:39:04Z

Close in favour of #172

Use temp dir for output

b3def70

felixhekhorn added enhancement New feature or request output Output format and management labels Oct 11, 2022

felixhekhorn self-assigned this Oct 11, 2022

Allow temp. create and start fixing tests

cc1947d

Add deepcopy, fix more tests

5ee7ea5

alecandido requested changes Oct 23, 2022

View reviewed changes

src/eko/output/struct.py Outdated Show resolved Hide resolved

felixhekhorn added 6 commits October 23, 2022 16:15

Recover test_xgrid_reshape

f200451

Recover test_flavor_reshape

4bc4f1c

Recover test_to_evol

73348c3

Stop checking abuse

7dae79a

Remove useless except

4bbc460

Split manipulate tests

718998a

alecandido requested changes Oct 23, 2022

View reviewed changes

felixhekhorn added 6 commits November 21, 2022 16:12

Add whitespace

9d9b3be

Add more whitespace

cb7811a

Merge branch 'develop' into feature/output-temp-dir

56c3c5e

Add more tests

5b13dfc

Reactivate legacy tests

ddc2355

Complete tests

2077372

felixhekhorn linked an issue Nov 21, 2022 that may be closed by this pull request

Work in a Temporary Folder #139

Closed

5 tasks

felixhekhorn marked this pull request as ready for review November 21, 2022 16:19

felixhekhorn requested review from alecandido and giacomomagni November 21, 2022 16:19

alecandido requested changes Nov 21, 2022

View reviewed changes

andreab1997 mentioned this pull request Nov 29, 2022

Output metadata #162

Closed

alecandido mentioned this pull request Dec 5, 2022

Use tempdir during execution #172

Merged

20 tasks

felixhekhorn closed this Dec 22, 2022

felixhekhorn deleted the feature/output-temp-dir branch January 5, 2023 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use temporary working directory for output #152

Use temporary working directory for output #152

felixhekhorn commented Oct 11, 2022 •

edited

Loading

codecov-commenter commented Oct 23, 2022 •

edited

Loading

alecandido Oct 23, 2022

alecandido Oct 23, 2022 •

edited

Loading

felixhekhorn Nov 21, 2022

alecandido Nov 21, 2022

alecandido commented Nov 21, 2022

felixhekhorn commented Nov 21, 2022

alecandido commented Nov 21, 2022 •

edited

Loading

alecandido commented Nov 21, 2022

andreab1997 commented Nov 21, 2022

alecandido left a comment

alecandido Nov 21, 2022

alecandido Nov 21, 2022

alecandido Nov 21, 2022

alecandido Nov 21, 2022

alecandido Nov 21, 2022

alecandido Nov 21, 2022

alecandido Nov 21, 2022

alecandido Nov 21, 2022

alecandido Nov 21, 2022

alecandido Nov 21, 2022

alecandido commented Nov 21, 2022 •

edited

Loading

felixhekhorn commented Dec 22, 2022


		Please refer to our documentation for a full overview of the possibilities.
		"""

	def open_tar(cls, path: os.PathLike):
	def load(cls, path: os.PathLike):

Use temporary working directory for output #152

Use temporary working directory for output #152

Conversation

felixhekhorn commented Oct 11, 2022 • edited Loading

codecov-commenter commented Oct 23, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

alecandido Oct 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alecandido commented Nov 21, 2022

felixhekhorn commented Nov 21, 2022

alecandido commented Nov 21, 2022 • edited Loading

alecandido commented Nov 21, 2022

andreab1997 commented Nov 21, 2022

alecandido left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alecandido commented Nov 21, 2022 • edited Loading

felixhekhorn commented Dec 22, 2022

felixhekhorn commented Oct 11, 2022 •

edited

Loading

codecov-commenter commented Oct 23, 2022 •

edited

Loading

alecandido Oct 23, 2022 •

edited

Loading

alecandido commented Nov 21, 2022 •

edited

Loading

alecandido commented Nov 21, 2022 •

edited

Loading