Skip to content

[FR]: Use bazel worker for copy_file_action #1046

@Andrius-B

Description

@Andrius-B

Hi, I prototyped an idea about using a bazel worker and the naive implementation seems to copy the files over about 50% faster than using the cp process. I wanted to get some feedback if this kind of feature would be acceptable before spending more time on it (it's by no means ready for review yet).
This repository already has a tool written in go that copies directories copy_directory, so the toolchains and the release process are all in place. I would like to add a similar tool and related toolchains copy_file. Then in the copy_file_action this tool would be called instead of cp from coreutils.

The context for this change is that copying these files takes a while when there are a lot of source files. As an example of this case I created a small reproduction which generates 10_000 source src/*.js files and then builds a js_binary using all of those files, in tern calling copy_file_action on all of them. The whole build consists of only really copying files in this case so it's not a fair evaluation of how this feature would affect real builds, but gives some insights into the overhead of spawning so many processes.

With the latest [email protected] which uses cp the build takes around 80 seconds on my M1 macbook:

$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 83.830s, Critical Path: 3.58s
INFO: 10004 processes: 3 action cache hit, 3 internal, 10001 local.
INFO: Build completed successfully, 10004 total actions
bazel build example  0.06s user 0.06s system 0% cpu 1:25.07 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 83.391s, Critical Path: 3.64s
INFO: 10004 processes: 3 action cache hit, 3 internal, 10001 local.
INFO: Build completed successfully, 10004 total actions
bazel build example  0.06s user 0.08s system 0% cpu 1:25.63 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 82.971s, Critical Path: 3.68s
INFO: 10004 processes: 3 action cache hit, 3 internal, 10001 local.
INFO: Build completed successfully, 10004 total actions
bazel build example  0.06s user 0.07s system 0% cpu 1:24.75 total

Where as using a 4 worker processes inflates the action count 2x (as each copy actions gets an additional WriteFile action for the argument file), however it seems to run around twice as fast using a singleplex proto worker written in go. There are additional actions in this output since the build of the copy_file toolchain is cached as part of the build too:

$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 40.807s, Critical Path: 4.89s
INFO: 20005 processes: 37 action cache hit, 10004 internal, 10001 worker.
INFO: Build completed successfully, 20005 total actions
bazel build example  0.05s user 0.06s system 0% cpu 43.757 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 37.610s, Critical Path: 4.96s
INFO: 20004 processes: 38 action cache hit, 10003 internal, 10001 worker.
INFO: Build completed successfully, 20004 total actions
bazel build example  0.04s user 0.05s system 0% cpu 39.256 total
$ python ./src/generate.py; time bazel build example
[..]
INFO: Elapsed time: 36.599s, Critical Path: 6.42s
INFO: 20004 processes: 4 action cache hit, 10003 internal, 10001 worker.
INFO: Build completed successfully, 20004 total actions
bazel build example  0.04s user 0.05s system 0% cpu 37.736 total

I am yet to test this on other platforms, but it looks promising. Let me know what you think

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedAspect isn't prioritizing this, but the community couldperformanceImprove performance of existing features

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions