Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERRO[0025] unlinkat /var/tmp/buildah2410054376/mounts3022885724/bind626918239: device or resource busy #5988

Open
cevich opened this issue Feb 12, 2025 · 8 comments

Comments

@cevich
Copy link
Member

cevich commented Feb 12, 2025

When building inside a rootless container using buildah's vfs storage driver and chroot isolation (As is very often done to build images in CI environments), specifying read/write bind volumes from other stages results in an error. This behavior does not reproduce using buildah 1.37 or earlier. Also verified this same behavior using a vanilla registry.fedoraproject.org/fedora-minimal images + dnf5 install buildah. That is to say, I think it's a buildah problem, not a buildah image problem.

Reproduction (host) environment:

  • Fedora 40
  • podman 5.3.1
  • Running as a regular user w/ default podman settings
  • The quay.io/buildah/upstream:latest container image (buildah version 1.40.0-dev (image-spec 1.1.0, runtime-spec 1.2.0))
  • The quay.io/buildah/stable:v1.38 container image
  • The quay.io/buildah/stable:v1.37 container image

Steps to reproduce:

  1. Create the following Containerfile somewhere in the users homedir
    FROM registry.fedoraproject.org/fedora-minimal:latest as test
    RUN mkdir -p /var/tmp/test
    ADD ./Containerfile /var/tmp/test/
    
    FROM test as final
    RUN --mount=type=bind,from=test,src=/var/tmp/test,dst=/var/tmp/test,rw \
        set -x && \
        date > /var/tmp/test/Containerfile && \
        cat /var/tmp/test/Containerfile
    
  2. Run podman run -it --rm -v ./Containerfile:/root/Containerfile:ro,Z quay.io/buildah/stable:v1.38 buildah --storage-driver=vfs build --isolation=chroot /root
  3. Run the exact same command, but with quay.io/buildah/stable:v1.37 (or any other earlier version)

Unexpected results:

[1/2] STEP 1/3: FROM registry.fedoraproject.org/fedora-minimal:latest AS test
Trying to pull registry.fedoraproject.org/fedora-minimal:latest...
Getting image source signatures
Copying blob 169491f3e4f7 done   |
Copying config e6917e6306 done   |
Writing manifest to image destination
[1/2] STEP 2/3: RUN mkdir -p /var/tmp/test
[1/2] STEP 3/3: ADD ./Containerfile /var/tmp/test/
Getting image source signatures
Copying blob cde90dcf8c1f skipped: already exists
Copying blob cec21250b843 done   |
Copying config 9f9e432f21 done   |
Writing manifest to image destination
--> 9f9e432f21cb
[2/2] STEP 1/2: FROM 9f9e432f21cbb67c928b93d87af3878f3b903cbc2030cc12594f9368829ccc8c AS final
[2/2] STEP 2/2: RUN --mount=type=bind,from=test,src=/var/tmp/test,dst=/var/tmp/test,rw     set -x &&     date > /var/tmp/test/Containerfile &&     cat /var/tmp/test/Containerfile
ERRO[0025] unlinkat /var/tmp/buildah1274147250/mounts4133407440/bind3931917386: device or resource busy
Error: building at STEP "RUN --mount=type=bind,from=test,src=/var/tmp/test,dst=/var/tmp/test,rw set -x &&     date > /var/tmp/test/Containerfile &&     cat /var/tmp/test/Containerfile": resolving mountpoints for container "bb08d8062b4c17b75108492838e53d3236abce647447c8f5bec72cebfcb8ca1b": setting up overlay of "/var/tmp/buildah1274147250/mounts4133407440/bind3931917386": mount overlay:/var/tmp/buildah1274147250/mounts4133407440/overlay/981784139/merge, data: lowerdir=/var/tmp/buildah1274147250/mounts4133407440/bind3931917386,upperdir=/var/tmp/buildah1274147250/mounts4133407440/overlay/981784139/upper,workdir=/var/tmp/buildah1274147250/mounts4133407440/overlay/981784139/work,userxattr: invalid argument

Expected results (from v1.37):

[1/2] STEP 1/3: FROM registry.fedoraproject.org/fedora-minimal:latest AS test
Trying to pull registry.fedoraproject.org/fedora-minimal:latest...
Getting image source signatures
Copying blob 169491f3e4f7 done   |
Copying config e6917e6306 done   |
Writing manifest to image destination
[1/2] STEP 2/3: RUN mkdir -p /var/tmp/test
[1/2] STEP 3/3: ADD ./Containerfile /var/tmp/test/
Getting image source signatures
Copying blob cde90dcf8c1f skipped: already exists
Copying blob b50f8aabd929 done   |
Copying config 71ea00d65f done   |
Writing manifest to image destination
--> 71ea00d65f89
[2/2] STEP 1/2: FROM 71ea00d65f8949486c4441a13b231fd4992b2be2c4170e97a0b9baae11244f71 AS final
[2/2] STEP 2/2: RUN --mount=type=bind,from=test,src=/var/tmp/test,dst=/var/tmp/test,rw     set -x &&     date > /var/tmp/test/Containerfile &&     cat /var/tmp/test/Containerfile
WARN[0000] couldn't find "/var/lib/containers/storage/vfs/dir/7d684fe50918fe44941621b1721c8ee345f7884e2887f8cae36608bacb38e0e8/tmp/test" on host to bind mount into container
+ date
+ cat /var/tmp/test/Containerfile
Wed Feb 12 18:17:34 UTC 2025
[2/2] COMMIT
Getting image source signatures
Copying blob cde90dcf8c1f skipped: already exists
Copying blob b50f8aabd929 skipped: already exists
Copying blob 11db3e39f474 done   |
Copying config 83de1e9298 done   |
Writing manifest to image destination
--> 83de1e9298fe
83de1e9298feac0ce7e01e89b840e42ecd3901a4a67d1b998b3bdbe176fd3a69

Debug output from v1.38 is below (v1.40.0-dev output is substantially similar):

buildah_v1.38_debug.log.txt

Note: Also attempted with the following Containerfile with similar results:

FROM registry.fedoraproject.org/fedora-minimal:latest as test

ADD ./Containerfile /test/
RUN chmod 777 /test/Containerfile

#####

FROM test as final

RUN --mount=type=bind,from=test,src=/test,dst=/test,rw \
    set -x && \
    date > /test/Containerfile && \
    cat /test/Containerfile
@cevich
Copy link
Member Author

cevich commented Feb 14, 2025

Poking through the debuglog and the code, I'm thinking perhaps this problem is stemming from within containers/storage based on convertToOverlay() getting an error back from overlay.MountWithOptions(). I didn't dig too deep into the storage code, but the ,userxattr suffix on the end of the debug messages made my ears stand up: "Why would that be present or even relevant for a VFS "bind" mount?"

time="2025-02-12T18:19:46Z" level=debug msg="Error building at step
{Env:[container=oci ...cut...: resolving mountpoints for container
...cut...: setting up overlay of \"/var/tmp/buildah3627628243/mounts2014160263/bind3820943893\": 
mount overlay:
...cut...,
workdir=/var/tmp/buildah3627628243/mounts2014160263/overlay/1907194961/work,userxattr: invalid argument"

@ssams
Copy link

ssams commented Feb 25, 2025

stumbled across what appears to be the same issue in a build (also using VFS storage driver), to me it seems the problem starts to appear with buildah version 1.37.6:

time="2025-02-21T09:00:59Z" level=error msg="unlinkat /var/tmp/buildah1222469549/mounts3222934611/bind1342232015: device or resource busy"
Error: building at STEP "RUN --mount=type=bind,source=requirements.txt,target=/tmp/pip-tmp/requirements.txt [...]": resolving mountpoints for container "8a8dd1c7104a71218d2e85f1b657facd2a45051f9c0ccf56a267ed85046d6d06": setting up overlay of "/var/tmp/buildah1222469549/mounts3222934611/bind1342232015": mount overlay:/var/tmp/buildah1222469549/mounts3222934611/overlay/1549299006/merge, data: lowerdir=/var/tmp/buildah1222469549/mounts3222934611/bind1342232015,upperdir=/var/tmp/buildah1222469549/mounts3222934611/overlay/1549299006/upper,workdir=/var/tmp/buildah1222469549/mounts3222934611/overlay/1549299006/work,userxattr: invalid argument

this is with buildah version 1.37.6 (image-spec 1.1.0, runtime-spec 1.2.0) running via container registry.redhat.io/rhel9/buildah:9.5-1738643435.

everything works as expected with buildah version 1.37.5 (image-spec 1.1.0, runtime-spec 1.2.0) via registry.redhat.io/rhel9/buildah:9.5-1737479141

@cevich
Copy link
Member Author

cevich commented Feb 25, 2025

Interesting, and thanks for providing details. Knowing this behavior crept in via a patch release is actually really helpful. I just checked, and it was 1.37.5 that fixed the issue for me, which makes sense based on your experience.

Checking the git history, there are only 17 commits between 1.37.5 and 1.37.6. Of these, almost half are merge or changelog update commits. So that narrows things down quite a bit!

@cevich
Copy link
Member Author

cevich commented Feb 25, 2025

Based on the string setting up overlay of in the message, I believe the problem is somewhere in/around convertToOverlay() which first appeared in 2c70035 (between .5 and .6). Curiously as near as I can tell, the containers/storage module was last updated in 1.37.5, so that's probably not the root cause.

There are several conditionals that would all emit a similar message, but I think this is coming from the the 4th one, dealing with a failure from overlay.MountWithOptions(). However, it's also possible this error is a red-herring, and the problem is really coming from GetBindMount(), where convertToOverlay() shouldn't even be used for a VFS mount (clearly we're not reproducing with a mountedImage):

func GetBindMount(...cut...
        ...cut...

        overlayDir := ""
        if mountedImage != "" || mountIsReadWrite(newMount) {
                if newMount, overlayDir, err = convertToOverlay(newMount, store, mountLabel, tmpDir, 0, 0); err != nil {
                        return newMount, "", "", "", err
                }
        }

        succeeded = true
        return newMount, mountedImage, intermediateMount, overlayDir, nil
}

@ssams
Copy link

ssams commented Feb 26, 2025

didn't get to look at the details of the commit, but it sounds very plausible to me. at least I can confirm that removing the rw option makes the mount itself succeed in my case, with the default read-only bind mount it would work. which also further hints towards these changes around read-write mounts.

@ssams
Copy link

ssams commented Feb 26, 2025

and I noticed that I may have shortened the output in my earlier comment a bit much, so in case it could be helpful, the apparently problematic line in my build is: --mount=type=bind,source=third_party/,target=/tmp/pip-tmp/third_party/,rw (so in my case it's mounted from the host, not from an earlier build stage). as indicated in the last comment, removing the rw makes the mount work, so --mount=type=bind,source=third_party/,target=/tmp/pip-tmp/third_party/ works.

@cevich
Copy link
Member Author

cevich commented Feb 26, 2025

All good data points, thanks again for sharing. For VFS I don't think it matters if the source is another stage or w/in the context dir, both should just resolve to directories on the "host" side. SELinux could be to blame, however the way I was reproducing it, nested w/in quay.io/buildah/stable, rules that out.

@cevich
Copy link
Member Author

cevich commented Feb 28, 2025

Something interesting one of my colleagues noticed:

If you do a sudo dmesg -HW on the host then run the reproducer, there's an overlay error from the kernel at the exact same time as buildah tries the volume mount during the build. This is significant because with --storage-driver=vfs, the expectations is that overlay shouldn't be involved at all.

By my reading of internal/volumes/volumes.go to date, in the case of VFS, either GetBindMount() should never call convertToOverlay() or that function shouldn't be calling overlay.MountWithOptions() (which is overlay specific).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants