-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: problems new files:
include/exclude functionality
#5455
Comments
I guess there are two different use-cases that are at odds with each other. In the case of the bootstrap compilers, we actually do want everything from But since we now have a distinction between files:
- abc and files:
include:
- abc
exclude:
- def I guess we could keep the former as "take everything" and the latter as "take into account snapshotting"? |
The implicit filtering/ignoring of files added to the prefix from other packages only applies to files installed from packages which were created in separate recipes. It doesn't work when the host dependcies come from another output of the same recipe. You will need to craft your include exclude globs so they do not intersect. If you want a "everything not already in other outputs" then you would do something like this: - name: A
files:
include:
- A
- name: B
files:
include:
- B
- name: C
files:
include:
- "*"
exclude:
- A
- B The reason for this is mostly implementation motivated to minimize changes to the exisiting code, but also if we were to track which artifacts had already been added to other outputs, then suddenly the order in which the outputs are packaged matters! Currently, I think the order in which the outputs are packaged is somewhat inconsistent/arbitrary.
I have noticed this message too. It comes from a conda-implemented glob function and not something that I added when implementing include/exclude. I don't think it actually causes the build to fail though? If I recall, it's just a print statement which contains the word "Error". I agree this should be fixed though.
I'm not saying it's wrong per se, but I do think it's strange for a recipe to repackage artifacts from its host dependencies. Why repackage instead of depending? Maybe there's something to be fixed with the interaction between |
- name: llvm-tools-{{ major_ver }}
files:
include:
- bin/*-{{ major_ver }} # [unix]
exclude:
# belongs into llvmdev
- bin/llvm-config* # [unix]
[...]
- name: llvm-tools
files:
include:
# opt-viewer tool is in share
- bin/* # [unix]
- share/* # [unix]
- Library/bin/*.exe # [win]
- Library/share/* # [win]
exclude:
# belongs to llvm-tools-major-ver
- bin/*-{{ major_ver }} # [unix]
# belongs into llvmdev
- bin/llvm-config* # [unix]
- Library/bin/llvm-config* # [win]
requirements:
host:
- {{ pin_subpackage("llvm-tools-" ~ major_ver, exact=True) }} # [not win] |
Yes, trivially I can duplicate the exclusion rules in various places. But that's unnecessarily cumbersome and brittle. In all other scenarios the content is correctly determined as the diff to the host environment before installation. If I really want to overwrite something, it should be opt in with |
But conda already necessarily provides the right build order if a subpackage is a host dependency of another one? That logic must already exist in some form, because that's how it works when we manually layer the way these outputs get filled (examples are llvmdev, clangdev, arrow-cpp, google-cloud-cpp, and many more). PS. Sorry, was on a phone and went through my notifications in reverse chronological order (my mistake). |
I am thinking of a case where subpackages do not have a dependency relationship but have overlapping globs. Then it is ambiguous. This could happen when subpackages are optional like static library packages or something with optional modules? |
Anyways, my argument is that I think it's valuable to have the user explicitly define non-overlapping globs for the artifacts that they have created in their package and not rely on overlapping globs in which artifacts are separated implicitly based on dependency relationships. |
I don't see that it's ambiguous at all. That's exactly where the question of adding A to the host-deps of B specifies both the build order, as well as which files of A won't be duplicated in B, despite the overlapping glob. This is the core snapshotting mechanism how conda-build determines the before/after state of an environment, where the delta is what got "installed" and becomes the content of the output (by definition). Consequently, if we add something to the "before" state, it won't end up in the delta!
I disagree wholeheartedly. We should not force users to juggle multiple globs and their exclusion in various places redundantly. The default behaviour should match what's happening elsewhere by default as well, which is "delta between before & after". And for cases where that needs to be overridden, there's more existing functionality ( To be clear, I'm very grateful for the functionality you pioneered in #5216, it'll be a great addition to the toolkit. However, the behaviour as of 24.7 is inconsistent with other conda-build behaviour (I'd go as far as saying: conceptually broken), and while I'm not a maintainer here, I think we do have a small-but-closing window to fix this brand-new functionality before people start relying on this behaviour. |
A summary of the discussion so farWe agree that printing an error about exclusion globs not matching anything is extraneous and should be fixed. We agree that Continuing discussion of which files should be implcitly excluded
Can you provide an example of this "delta between before & after" paradigm elsewhere in the conda-build process? With the previous behavior (using package:
name: subpackage-test-split
version: 0.0.0
build:
number: 0
script:
- touch ${PREFIX}/a-file
outputs:
- name: subpackage-name-first
files:
- a-file
test:
commands:
- test -f ${PREFIX}/a-file
- name: subpackage-name-second
files:
- a-file
test:
commands:
- test -f ${PREFIX}/a-file
about:
summary: A fake package to test conda-build behavior
and they would both contain The other previous behavior using |
The expand_globs function from conda_build.utils logs an ERROR when a glob expression returns no matches, this is overly alarming because the user may now use negative glob expressions which they don't care if it returns empty or the user may want to use the same set of glob expressions for multiple platforms some of which may return empty on some platforms. conda#5216 conda#5455
See the docs for the conda-build process:
That's what I meant when I said it's the keys mechanism, but I'll freely admit I don't know the codebase here, and it would surely be beneficial to have some maintainers here chime in.
My understanding on the history is about as limited as my familiarity with the code-base, but AFAIU, the So we have some ambiguity due to the way things grew historically. If you use outputs only to slice the result of a global build step, I can see why things were done the way they are. My argument is that the consistent thing to do would be to let outputs be first-class citizens that enjoy all the features that regular builds get as well. Your example is missing a key point that makes my proposal (and the situation, IMO) unambiguous: outputs:
- name: subpackage-name-first
files:
- a-file
test:
commands:
- test -f ${PREFIX}/a-file
- name: subpackage-name-second
files:
- a-file
requirements:
host:
- {{ pin_subpackage("subpackage-name-first", exact=True) }} # <- !!!
run:
# whether we add it under run: as well determines if the test below passes/fails
- {{ pin_subpackage("subpackage-name-first", exact=True) }}
test:
commands:
- test -f ${PREFIX}/a-file By specifying that |
@kenodegard @travishathaway @jezdez @beeankha This problem gets much worse if several outputs depend on each other and would lead to duplicated |
The functionality introduced in #5216 by @carterbox promises to be very useful on several feedstocks that need to "slice and dice" the result of a build into different outputs, so I started testing this for llvmdev in conda-forge/llvmdev-feedstock#283. This runs into a couple of different problems though.
files:
does not take into account files already existing inhost:
In llvmdev, we want to version the binaries (and some libraries), and create symlinks for the unversioned ones (that point to the versioned ones). These should go into different outputs. As such,
llvm-tools-18
should contain the versioned binaries (exceptllvm-config
), andllvm-tools
should contain the symlinks. The following configuration expresses that intent.However, what ends up happening is that
llvm-tools
repackages the output ofllvm-tools-18
, despitellvm-tools-18
being a host-dependence ofllvm-tools
(which should therefore be part of the baseline snapshot before installation that's the basis for determining what got installed on top).This is a common setup, i.e. it's often desirable to slice off
lib/libfoo.so
andlib/libbar.so
into separate outputs, and everything else inlib
should go into yet another output. Similarly here, for thellvmdev
output, we want to pick up everything that hasn't already been installed into other outputs. That too doesn't seem to work (i.e.bin/llvm-config
ends up missing), using the seemingly obviousExclusion rule causes glob error
One output has:
on osx-arm, the exclusion causes a glob error:
It's worth noting that this file doesn't currently get installed, but if it ever does in the future, it should be excluded (hence the rule). Excluding files that aren't present needs to not cause errors.
Also, somehow the osx-arm case (cross-compiled from x64) manages to trigger something for the wrong architecture:
That might be a corner case though, as we're packaging the binary that perhaps gets called by conda-build itself (if a symlink
otool
->llvm-otool
exists). However, this aspect didn't fail before switching to thefiles:
-based approach.The text was updated successfully, but these errors were encountered: