-
-
Notifications
You must be signed in to change notification settings - Fork 420
Dynamically map known root paths in cache files #6031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Tobias Roeser <[email protected]>
|
I'm down to a single failing test case for which I don't have an explanation or fix for: > mill integration.feature[output-directory].packaged.daemon.testForked
...
[7778-0] + mill.integration.OutputDirectoryTests.Output directory elsewhere in workspace 24540ms
[7778-0] Copying integration test sources from /home/lefou/work/opensource/mill/integration/feature/output-directory/resources to /home/lefou/work/opensource/mill/out/integration/feature/output-directory/packaged/daemon/testForked.dest/worker-0/sandbox/run-3
[7778-0] X mill.integration.OutputDirectoryTests.Output directory outside workspace 6137ms
[7778-0] java.lang.AssertionError: assertion failed: ==> assertion failed: false != true
[7778-0] scala.runtime.Scala3RunTime$.assertFailed(Scala3RunTime.scala:8)
[7778-0] utest.asserts.Asserts$ArrowAssert.$eq$eq$greater(Asserts.scala:90)
[7778-0] mill.integration.OutputDirectoryTests$.tests$$anonfun$1$$anonfun$3$$anonfun$1(OutputDirectoryTests.scala:35)
[7778-0] scala.runtime.function.JProcedure1.apply(JProcedure1.java:15)
[7778-0] scala.runtime.function.JProcedure1.apply(JProcedure1.java:10)
[7778-0] mill.testkit.IntegrationTestSuite.integrationTest$$anonfun$1(IntegrationTestSuite.scala:60)
[7778-0] mill.util.Retry.apply$$anonfun$1(Retry.scala:38)
[7778-0] mill.util.Retry.apply$$anonfun$adapted$1(Retry.scala:38)
[7778-0] mill.util.Retry.$anonfun$1$$anonfun$1(Retry.scala:49)
[7778-0] scala.util.Try$.apply(Try.scala:217)
[7778-0] mill.util.Retry.$anonfun$1(Retry.scala:49)
[7778-0] java.lang.Thread.run(Thread.java:1583)
[7778-0] Tests: 3, Passed: 2, Failed: 1
...
1 tasks failed
[7778] integration.feature[output-directory].packaged.daemon.testForked 1 tests failed:
mill.integration.OutputDirectoryTests mill.integration.OutputDirectoryTests.Output directory outside workspace |
|
Fixed last test, which was caused by an accidental code paste |
|
This is good to merge. The next step is probably to add more external cache dirs like coursier cache. Usage of |
|
The approach taken by #4642 is to instrument |
|
This PR is explicitly just about paths where path is an explicit type. All other cases are paths in strings, which could be solved by some optimistically find-and-replace mechanism, which I personally only see as a last resort approach. Another way could be to make config task use more structured formats than just I must admit, I overlooked PR #4642 when I started this one. (Is it even linked?) But I don't like the sym-link approach. Also, one takeaway of this PR`s predecessor (#6021) is, that for communication with other processes and when launching tools we always want to use absolute paths (as we do currently). The whole caching abstraction of Mill happens in Mill, so the conversion between Mill-mapped-paths and absolute paths need to happen in the tasks (and workers). And in Mill, we have the power and the knowledge for translating between both systems. |
|
The symlink approach is what is used by Bazel for the same purpose. It's not simple by any means, but the question is if there's anything we can do that's better. An optimistic find-and-replace definitely doesn't seem sufficient, e.g. you cannot find-and-replace paths inside the binary I think the litmus test for this is really what was stated in the original ticket: Can we compile an example project in two separate folders, diff the |
Another attempt to make Mill cache storage relocate-able.
The idea is to replace absolute paths with paths relative to pre-defined known root paths.
Fix parts of #3660
API
Path mappings are managed via the new
MappedRootsobject. Mappings use a thread localDynamicVariableand various convenience methods to easily adapt the mapping.An explicit API to change the mapping dynamically is necessary, as the underlying JSON-serialization mechanism can be used for other purposes than Mill caching, like RPC protocols, where a path mapping is not desirable.
Default Mapping
These are the default location which are mapped by Mill:
$WORKSPACE- the project directory (where the top-levelbuild.millcan be found. (BuildCtx.workspaceRoot)$MILL_OUT- the Mill output directory used for caches and build-results$HOME- the user home directory (os.home)Mappings are automatically used in the
upickle-based JSON serializers formill.api.PathRefandos.Path.More paths come to mind:
$CACHE_COURSIER- the coursier cache directory$CACHE_IVY- the ivy cache directory$CACHE_MAVEN- the Maven cache directoryThose are currently not directly accessible in the code base for all the places where we need to be able to read and write JSON cache files. Also, their typical storage locations are already relative to
$HOME(e.g.$CACHE_COURSIERvs$HOME/.cache/coursier), hence I left them out for this PR.We can later add them by adapting
Mapping.withMillDefaults.Design Choices
One important design choice I made, is to go with a dynamic set of known roots. This means, without any registered mapping,
PathRef/os.PathJSON-serialization will behave exactly as before. If there are registered path root mappings at serialization time, these mappings will be used. While de-serialization of a mapped path, a mapping for the used key is required, otherwise the de-serialization will fail with an runtime exception.While it is not ideal to have potential runtime failures, an alternative design with a static mapping, which should behave more deterministically, hasn't worked out well. This is mostly due to the fact that JSON serialization is a universal concept which is, even in Mill, used for various use cases, and not uniquely for Mill cache persistence. While in theory, it should be controllable via the
given/implicitscopes, practically we need to define all virtual paths mapping anywhere in the code base, which isn't only impractical, but also sometimes impossible. Effectively it would require us to use dedicated data types for when we need path mappings (e.g. in Mill caches) and when not (e.g. when defining RPC protocols). See PR #6021 where I tried to have a fixed mapping, but had a lot of issues.PathRef
hashCodechangesDue to the way the
PathRef.hashCodeis used in Mill, esp. in theEvaluatorwhen deciding whether a task needs re-computation, it was important to keep the Java hash code stable and consistent.PathRef.hashCodenow uses a mapped path (as configured at construction time) instead of the unmapped, absolute path. Otherwise, it causes constant re-evaluations of already cached tasks. It should be possible to "construct" cases, in which this leads to more hash collisions, but those cases are very unlikely, unrealistic and should still be unproblematic, since a path-change without asig-change still means, we refer to the same path structure and content.Other changes
Some refactoring was necessary to streamline the configuration flow of the
$MILL_OUTpath. Before, theoutpath was constructed in many different locations, sometimes with slightly different logic. This is of course a minefield, since we have many entry points (Mill CLI, Mill Daemon, Mill BSP, RPC worker processes, Test Suites, Integration tests, ...) where those where set up. So this PR relies heavily on our programmatic test coverage.Other effects
The
showcommand can be used to show the result of a (cached) task. It now no longer shows local absolute paths for locations covered by a mapped root (which should be the most occurrences) but the paths with placeholders instead. This is IMHO less helpful, but not so easy to fix correctly.