-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add option to require explicit fetch/checkout #10451
base: main
Are you sure you want to change the base?
Conversation
289b8b1
to
6ae7846
Compare
dvc/repo/fetch.py
Outdated
@@ -55,6 +55,8 @@ def stage_filter(stage: "Stage") -> bool: | |||
def outs_filter(out: "Output") -> bool: | |||
if push and not out.can_push: | |||
return False | |||
if out.explicit and not any(targets): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should out
be compared to each target
to see if it matches any of the targets? It looks like this would fetch out
anytime any target
is specified, even if the target
does not match the explicit out
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally went down that path, then it occurred to me that the usual target matching logic would further filter the list that this function returns, so there should be no need to reimplement that logic. I updated the unit test to demonstrate and build confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! @skshetry Could you take a look please?
one ask (if we decide to go with this) - can we find a better name please? if this options is about
if it affects both push and pull (does it?):
etc. Let's brainstorm here a bit please. thanks @pbailey-hf ! |
It only affects
|
Agreed, but I would suggest slightly different naming to better match the existing
Edit: note that this also implies that we add |
That makes sense to me. I will implement it shortly. |
dvc/output.py
Outdated
@@ -1501,6 +1507,7 @@ def _merge_dir_version_meta(self, other: "Output"): | |||
**ANNOTATION_SCHEMA, | |||
Output.PARAM_CACHE: bool, | |||
Output.PARAM_REMOTE: str, | |||
Output.PARAM_PULL: vol.All(vol.Lower, Choices("always", "never", "explicit")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted for always
and never
instead of true
and false
to avoid confusion with YAML types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to use YAML types and true / false. Also, I don't think we need the third option. I would make false
behavior that same as explicit to keep things simpler.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10451 +/- ##
==========================================
+ Coverage 90.34% 90.41% +0.07%
==========================================
Files 481 481
Lines 35981 36036 +55
Branches 5038 5053 +15
==========================================
+ Hits 32506 32583 +77
+ Misses 2877 2873 -4
+ Partials 598 580 -18 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need a dvc.org PR / update. Thanks @pbailey-hf !
Done! iterative/dvc.org#5263 |
@skshetry if everything looks good, can we merge this? can you do some final checks please? |
tests/func/test_checkout.py
Outdated
assert not (tmp_dir / "explicit_2").exists() | ||
assert (tmp_dir / "always").read_text() == "z" | ||
|
||
dvc.checkout(targets="explicit_1.dvc", force=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dberenbaum @shcheklein, this looks odd to me, that a pull
option is affecting checkout
.
I don't think we should support this at all.
If we don't support it, checkout
will fail if we haven't fetched it already as before with the following message:
dvc.exceptions.CheckoutError: Checkout failed for following targets:
explicit_1
explicit_2
Is your cache up to date?
<https://error.dvc.org/missing-files>
pull()
should work as expected though, and not fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry I understand your concern, what is the practical thing here for users though? any idea? If you have a few targets that have pull: false
you will keep hitting this failure? do we need an explicit option for checkout then - something like "--allow-missing" in dvc pull
?
what is the expected behavior for dvc pull
itself considering that it is dvc fetch
+ dvc checkout
?
probably they should be aligned ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my intuition as well, that pull
= fetch
+ checkout
, so the option should apply to both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though it is a combination of fetch
and checkout
, it does not necessarily mean pull
options should apply directly to fetch
and checkout
. pull
is a higher-level command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry do, what is your practical suggestion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternative, re the name:
sync: push | pull | both | all | checkout
- it can be a single option that pretty much encompasses everything
I think this is reasonable and aligns with what I understood explicit
to be trying to do - skip syncing those outputs unless you explicitly pass them as targets (so dvc push/pull/checkout
skips it but dvc push/pull/checkout skipped_target
doesn't).
I'm not sure we need all of push | pull | both | all | checkout
now because we already have push: false
and can't deprecate it right away. We could try a minimal version of sync: true | false
to start and see with more time which of push | pull | both | checkout
are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could try a minimal version of sync: true | false to start
And we would avoid all the operation on it? push, pull, checkout? I think that's fine. I think we can keep granular push: false
and even probably fetch: false
(see below). People will be asking about this. sync
is too broad and won't be convenient in a lot of cases. But I'm fine to start there. @skshetry what is your take?
Would you be open to implicit "allow missing" behavior for pull:false outputs?
It should be fetch: false
(not pull:false
to emphasize that if affects only fetch. And checkout --allow-missing
behaves the same way as it does now (it should not be trying to analyze lower level fetch flags, etc). This way pull
and checkout
are consistent in their behavior - that is important here.
I think we can add a config option for checkout to allow missing by default.
would that combination work for you?
fetch: false
- config option to imply
allow-missing
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pbailey-hf What do you think about sync: false
? If you are okay starting there, please go ahead with it, and if you want, you could also add fetch: false
or other granular options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dberenbaum I'd prefer to control pull
/push
independently, which makes me lean away from the sync: false
option. If I'm picking from all of the options presented so far, I'd prefer fetch:false
combined with a configured allow-missing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for the feedback and your patience @pbailey-hf! I worry about the complexity of so many flags to configure, but let's go with your preference for now to move forward. Please ping when you would like another review.
This PR adds the
explicit
field to outs in .dvc files. outs marked as explicit must be explicitly mentioned on the command line to be affected byfetch
,checkout
, orpull
.❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏