Skip to content

Test overwrite_existing apparent symlink dereferencing #2008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

EliahKagan
Copy link
Member

@EliahKagan EliahKagan commented May 10, 2025

To investigate #2006, this explodes the occasionally failing test writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed into 1000 identical tests, using test_case::test_matrix.

When run in series, they all pass, but when run in parallel with cargo nextest, some failures always occur, at least when this is done on macOS. This is as predicted in #2006 (comment).

A more specific result in local testing on macOS 15 is that, while the number of failures is 0 with --test-threads=1, with 2 or more threads, the number of failures tends to decrease with the number of threads. The most failures occur with --test-threads=2, about 30 on the system tested, while --test-threads=16 has about 10 failures.

The tests were run with this command, with any --test-threads=<N> argument added:

cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast

See #2006 (comment) for substantial further details, including more up to date information covering reproduction on other platforms--it turns out not to be just macOS--and other relevant information and ideas.

(Maybe this PR can turn into a fix, once the cause is figured out. If not, it could be superseded by the fix and closed.)

@EliahKagan EliahKagan force-pushed the clash branch 2 times, most recently from a66c167 to 68fd47a Compare May 17, 2025 00:14
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request May 18, 2025
This (from #36) adds a `test-ext4-casefold` matrix job
definition to `ci.yml`, which is intended probably to be temporary,
to demonstrate for GitoxideLabs#2008 that GitoxideLabs#2006 affects GNU/Linux (in addition
to macOS and Windows), when the GNU/Linux system is using an ext4
volume with a case-insensitive directory tree, i.e., where the ext4
filesystem is created with `-O casefold` and operations are
performed in a directory within that filesystem where `chattr +F`
has been used to enable case-folding path equivalence.

The bug is nondeterministic and challenging to produce on
GNU/Linux, even when the case-folding precondition is satisfied (as
all the `test-ext4-casefold` jobs achieve). It is much harder to
produce on GNU/Linux than on macOS or Windows. Accordingly, the
number of duplicate test cases is increased 25-fold from 1000 to
25000 in those tests, and multiple jobs are created, some of them
equivalent.

As expected, the failures do not occur when the runner does not run
tests in parallel, and they do occur most of the time when the
runner runs tests in parallel with a maximum number of parallel
tests set to 2 or to 4. However, unexpectedly, the failures almost
never occur when the test are run in parallel with a maximum number
of parallel tests set to 3. It is unclear to what extent, if any,
this generalizes, but I have not observed these patterns when
testing locally, including when modifying the local procedure to be
more similar to the behavior here (such as by suppressing reporting
of non-failing tests and suppressing the progress bar). In local
testing, I am unable to produce failures on some GNU/Linux systems,
but on those on which I am able to produce them, they are easier to
produce with all values of the `--test-threads` operand, including
3.

Not all operating systems, versions, and kernel builds and options
support case-folding on ext4. But support is fairly common on
Linux-based systems, and the CI runners have no problem with that.
In contrast, although case-folding tmpfs also exists on Linux, it
is newer and not (yet) supported as widely, and the GitHub-hosted
runners do not support case-folding tmpfs. It is possible to mount
a tmpfs filesystem and create an ext4 image in it so that, in
principle, fewer operations need to access the disk. That is one of
the approaches that was tried, but it does not appear to make it
any easier to surface these failures.

This squashes multiple commits. (Mistakes, and less illuminating
experiments, have already been omitted; the commits being squashes
here are those that are expected to be of possible interest, but
whose details are almost but not quite important enough to justify
the cumbersome effect of having them individually in the history.)
The squashed commits' individual messages are shown below.

For full changes and CI results from each commit, see:
#36

---

* Add a case-folding GNU/Linux CI test job

This can likely be removed later. But it might be kept if it finds
anything interesting, especially if it helps test a bugfix. In that
case, it may make sense to have it run only tests that are of
special interest in connection with case-sensitivity.

* Case-fold `$TMPDIR` as well in CI casefold test

* Run only 2 parallel test processes in test-ext4-casefold

This seemed to make the bug *easier* to reproduce (compared to
larger values) locally, so maybe it will surface it on CI even in
GNU/Linux.

* Increase symlink experiment reps 10x in test-ext4-casefold

From 1000 (0..=999) to 10000 (0..=9999).

* Try even harder to reproduce the failure on GNU/Linux

By repeating `cargo nextext` on the affected test group, up to 20
times.

* Add `ubuntu-24.04-arm` to `test-ext4-casefold` job

In case the failure might somehow be easier to reproduce there.

* Use tmpfs casefold instead of ext4 casefold

* Use tmpfs-backed ext4 as a workaround

For when tmpfs does not suppport `-o casefold` in mounting.

This does require that ext4 support `-O casefold`, but that was
already working before.

Even though that worked before, this could fail when using an ext4
image on tmpfs, because the ext4 image is smaller than it was when
tmpfs was not used, in order to allow it to be created in memory.
While this allows it to be created, the tests may fail if build
artifacts and other files need more space than is available.

* Run only the test(s) of interest

In `test-linux-casefold`, this runs only the tests that are
duplicates of each other, where the failure has been observed,
rather than first running the whole test suite.

The reason is to try to build fewer crates and test executables,
since this is currenntly failing because it runs out of space on
the ext4 image whose size is itself constrained by the amount of
available memory for the tmpfs filesystem on which the image is
created.

* Don't attempt to use case-folding tmpfs

But check if it is reported as available and, if so, issue a GHA
warning that it should be used instead of the current way.

* Try to fail at least as often, with less overhead

This combines a few changes:

- Only report the status of tests that actually fail.
- Run the `cargo nextest` run command once, not 20 times.
- Multiply the number of test cases by another factor of 10.
- Stop after the first failing case.

* Move the ext4 image off tmpfs

This tries going back to not using tmpfs for I/O speedup, with the
idea that the increased complexity and decreased flexibility might
be possible to avoid now that other adjustments are made to surface
the failure more reliably.

* Try to fail more often, with less delay

The `test-ext4-casefold` jobs were still not surfacing the error
quite reliably enough. The wait was also longer than ideal. This
commit makes some changes that hope to improve on both.

- Decrease the number of duplicate test cases by a factor of 4.
  This has the effect that they are increased by 25x compared to
  the multiplier in the commited code, instead of 100x.

- Create 10 times as many CI jobs in the matrix, by adding a `num`
  variable whose values are those in the range 0..=9. This variable
  is not actually used, it just serves to produce more CI jobs.

- Don't use `Swatinem/rust-cache` in these jobs anymore. It is not
  obvious that it will make things work better overall, in terms of
  speedup, cache usage, and cache access, now that we are
  increasing the number of matrix jobs by a factor of 10.

Moving some of the repetition into separate jobs, which may run in
parallel, is not only, nor even primarily, to leverage runner
parallelism to do more test runs faster. Instead, it is because it
looks like some chaotic initial state may contribute to how likely
the failure is to occur. No definitive experiment on CI shows this
to be the case -- but in local testing, on some systems, I often
have runs that fail early over and over again, and then wait a
while, come back, and have many runs that don't surface the bug,
then come back later and it is easy to produce again, and so on.

* Further diversify runners

* Undiversify runners but vary test parallelism

Since varying `runs-on` didn't surface more errors, nor help smooth
out inconsistencies across runs.

The `*-arm` runners seem to have fewer errors with the particular
way the tests are running now, so this just uses `ubuntu-latest`.

This also adjusts `rep`, having it take on more values, so as to
more than make up for what would otherwise be a smaller number of
jobs.

* Test more values for test parallelism

This creates additional CI `test-ext4-casefold` jobs, for more
values of the `--test-threads` operand passed to `cargo nextest`.

The motivation is that the value of 3 curiously seems never to
surface the failure on CI. Unlike 1, 3 should is expected to
surface the failure comparably to 2 and 4, both of which do fail
more often than not. Yet the jobs with `--test-threads=3` keep
passing.

(This is unlikely to be due to case-sensitivity misdetection, since
the assertions for when the filesystem is case-sensitive would also
fail, if they fire in a job where the filesystem is case-folding.)

If these jobs are kept, then `rep` should probably be adjusted to
have fewer, perhaps 7, values. However, this adjusts `rep` to have
16 values (i.e. twice as many as before), since the higher operands
to `--test-threads` are expected to produce the failure less often.

* Test fewer test parallelism values and reps

- Test `--test-threads` operands of 1 to 4, rather than 1 to 7.
- Do 7 reps instead of 16.

In addition to the 1000x repetition in `checkout.rs` of the
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and
Windows without further modifications -- the `test-ext4-casefold`
jobs as they currenly stand seem like they would ber sufficient to
test whether a possible future solution would fix the bug on
GNU/Linux.

This looks like about the minmum amount we should reasonably verify
still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no
longer failing *any* jobs afterwards. But the `test-ext4-casefold`
jobs are still much more than would be wanted on the main branch.

Assuming the cause of the failures is the same on all systems, and
the solution neither explicitly nor implicitly operates differently
across systems (when comparing how it works when the filesystem is
case-insensitive, that is), it may be enough on CI to regression
test only macOS and Windows for GitoxideLabs#2006.

Thus, this also adds a TODO comment atop the `test-ext4-casefold`
definition reminding that it should not be kept, or not in full.
@EliahKagan
Copy link
Member Author

Regarding 16171b4, don't worry: I do not plan to merge a new 28-job CI test matrix to main. See EliahKagan#36 for details on these jobs. They surface the failure on GNU/Linux; see #2006 (comment).

Note that 16171b4 (EliahKagan#36) doesn't include any changes to how the test uses gix-worktree-state functionality. In particular, it doesn't adjust whether gix-features/parallel is used or the value of the num_threads parameter. That's the next step (#2006 (comment)). I wanted to add the GNU/Linux repro on CI first, to have greater confidence about the meaning of whatever is observed due to subsequent changes.

@EliahKagan
Copy link
Member Author

EliahKagan commented May 18, 2025

Changes related to num_threads, to cause the checkout operation under test to be single-threaded, is now done here in b333ef3 (squashed from EliahKagan#37). I believe this is sufficient to test the scenario anticipated in #2006 (comment). However, as can be seen on CI, this does not make the failure go away. See also #2006 (comment).

@Byron
Copy link
Member

Byron commented May 18, 2025

However, as can be seen on CI, this does not make the failure go away.

This is great!

That means the Rust side of things should be fully deterministic. Either it is not, or there is some filesystem involvement. The only of such reasons (of filesystem involvement) I know is the reliance on traversal order, or possibly relying on mtimes being 'in a certain way'. None of that should matter here though.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request May 19, 2025
This (from #36) adds a `test-ext4-casefold` matrix job
definition to `ci.yml`, which is intended probably to be temporary,
to demonstrate for GitoxideLabs#2008 that GitoxideLabs#2006 affects GNU/Linux (in addition
to macOS and Windows), when the GNU/Linux system is using an ext4
volume with a case-insensitive directory tree, i.e., where the ext4
filesystem is created with `-O casefold` and operations are
performed in a directory within that filesystem where `chattr +F`
has been used to enable case-folding path equivalence.

The bug is nondeterministic and challenging to produce on
GNU/Linux, even when the case-folding precondition is satisfied (as
all the `test-ext4-casefold` jobs achieve). It is much harder to
produce on GNU/Linux than on macOS or Windows. Accordingly, the
number of duplicate test cases is increased 25-fold from 1000 to
25000 in those tests, and multiple jobs are created, some of them
equivalent.

As expected, the failures do not occur when the runner does not run
tests in parallel, and they do occur most of the time when the
runner runs tests in parallel with a maximum number of parallel
tests set to 2 or to 4. However, unexpectedly, the failures almost
never occur when the test are run in parallel with a maximum number
of parallel tests set to 3. It is unclear to what extent, if any,
this generalizes, but I have not observed these patterns when
testing locally, including when modifying the local procedure to be
more similar to the behavior here (such as by suppressing reporting
of non-failing tests and suppressing the progress bar). In local
testing, I am unable to produce failures on some GNU/Linux systems,
but on those on which I am able to produce them, they are easier to
produce with all values of the `--test-threads` operand, including
3.

Not all operating systems, versions, and kernel builds and options
support case-folding on ext4. But support is fairly common on
Linux-based systems, and the CI runners have no problem with that.
In contrast, although case-folding tmpfs also exists on Linux, it
is newer and not (yet) supported as widely, and the GitHub-hosted
runners do not support case-folding tmpfs. It is possible to mount
a tmpfs filesystem and create an ext4 image in it so that, in
principle, fewer operations need to access the disk. That is one of
the approaches that was tried, but it does not appear to make it
any easier to surface these failures.

This squashes multiple commits. (Mistakes, and less illuminating
experiments, have already been omitted; the commits being squashes
here are those that are expected to be of possible interest, but
whose details are almost but not quite important enough to justify
the cumbersome effect of having them individually in the history.)
The squashed commits' individual messages are shown below.

For full changes and CI results from each commit, see:
#36

---

* Add a case-folding GNU/Linux CI test job

This can likely be removed later. But it might be kept if it finds
anything interesting, especially if it helps test a bugfix. In that
case, it may make sense to have it run only tests that are of
special interest in connection with case-sensitivity.

* Case-fold `$TMPDIR` as well in CI casefold test

* Run only 2 parallel test processes in test-ext4-casefold

This seemed to make the bug *easier* to reproduce (compared to
larger values) locally, so maybe it will surface it on CI even in
GNU/Linux.

* Increase symlink experiment reps 10x in test-ext4-casefold

From 1000 (0..=999) to 10000 (0..=9999).

* Try even harder to reproduce the failure on GNU/Linux

By repeating `cargo nextext` on the affected test group, up to 20
times.

* Add `ubuntu-24.04-arm` to `test-ext4-casefold` job

In case the failure might somehow be easier to reproduce there.

* Use tmpfs casefold instead of ext4 casefold

* Use tmpfs-backed ext4 as a workaround

For when tmpfs does not suppport `-o casefold` in mounting.

This does require that ext4 support `-O casefold`, but that was
already working before.

Even though that worked before, this could fail when using an ext4
image on tmpfs, because the ext4 image is smaller than it was when
tmpfs was not used, in order to allow it to be created in memory.
While this allows it to be created, the tests may fail if build
artifacts and other files need more space than is available.

* Run only the test(s) of interest

In `test-linux-casefold`, this runs only the tests that are
duplicates of each other, where the failure has been observed,
rather than first running the whole test suite.

The reason is to try to build fewer crates and test executables,
since this is currenntly failing because it runs out of space on
the ext4 image whose size is itself constrained by the amount of
available memory for the tmpfs filesystem on which the image is
created.

* Don't attempt to use case-folding tmpfs

But check if it is reported as available and, if so, issue a GHA
warning that it should be used instead of the current way.

* Try to fail at least as often, with less overhead

This combines a few changes:

- Only report the status of tests that actually fail.
- Run the `cargo nextest` run command once, not 20 times.
- Multiply the number of test cases by another factor of 10.
- Stop after the first failing case.

* Move the ext4 image off tmpfs

This tries going back to not using tmpfs for I/O speedup, with the
idea that the increased complexity and decreased flexibility might
be possible to avoid now that other adjustments are made to surface
the failure more reliably.

* Try to fail more often, with less delay

The `test-ext4-casefold` jobs were still not surfacing the error
quite reliably enough. The wait was also longer than ideal. This
commit makes some changes that hope to improve on both.

- Decrease the number of duplicate test cases by a factor of 4.
  This has the effect that they are increased by 25x compared to
  the multiplier in the commited code, instead of 100x.

- Create 10 times as many CI jobs in the matrix, by adding a `num`
  variable whose values are those in the range 0..=9. This variable
  is not actually used, it just serves to produce more CI jobs.

- Don't use `Swatinem/rust-cache` in these jobs anymore. It is not
  obvious that it will make things work better overall, in terms of
  speedup, cache usage, and cache access, now that we are
  increasing the number of matrix jobs by a factor of 10.

Moving some of the repetition into separate jobs, which may run in
parallel, is not only, nor even primarily, to leverage runner
parallelism to do more test runs faster. Instead, it is because it
looks like some chaotic initial state may contribute to how likely
the failure is to occur. No definitive experiment on CI shows this
to be the case -- but in local testing, on some systems, I often
have runs that fail early over and over again, and then wait a
while, come back, and have many runs that don't surface the bug,
then come back later and it is easy to produce again, and so on.

* Further diversify runners

* Undiversify runners but vary test parallelism

Since varying `runs-on` didn't surface more errors, nor help smooth
out inconsistencies across runs.

The `*-arm` runners seem to have fewer errors with the particular
way the tests are running now, so this just uses `ubuntu-latest`.

This also adjusts `rep`, having it take on more values, so as to
more than make up for what would otherwise be a smaller number of
jobs.

* Test more values for test parallelism

This creates additional CI `test-ext4-casefold` jobs, for more
values of the `--test-threads` operand passed to `cargo nextest`.

The motivation is that the value of 3 curiously seems never to
surface the failure on CI. Unlike 1, 3 should is expected to
surface the failure comparably to 2 and 4, both of which do fail
more often than not. Yet the jobs with `--test-threads=3` keep
passing.

(This is unlikely to be due to case-sensitivity misdetection, since
the assertions for when the filesystem is case-sensitive would also
fail, if they fire in a job where the filesystem is case-folding.)

If these jobs are kept, then `rep` should probably be adjusted to
have fewer, perhaps 7, values. However, this adjusts `rep` to have
16 values (i.e. twice as many as before), since the higher operands
to `--test-threads` are expected to produce the failure less often.

* Test fewer test parallelism values and reps

- Test `--test-threads` operands of 1 to 4, rather than 1 to 7.
- Do 7 reps instead of 16.

In addition to the 1000x repetition in `checkout.rs` of the
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and
Windows without further modifications -- the `test-ext4-casefold`
jobs as they currenly stand seem like they would ber sufficient to
test whether a possible future solution would fix the bug on
GNU/Linux.

This looks like about the minmum amount we should reasonably verify
still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no
longer failing *any* jobs afterwards. But the `test-ext4-casefold`
jobs are still much more than would be wanted on the main branch.

Assuming the cause of the failures is the same on all systems, and
the solution neither explicitly nor implicitly operates differently
across systems (when comparing how it works when the filesystem is
case-insensitive, that is), it may be enough on CI to regression
test only macOS and Windows for GitoxideLabs#2006.

Thus, this also adds a TODO comment atop the `test-ext4-casefold`
definition reminding that it should not be kept, or not in full.
To investigate GitoxideLabs#2006, this explodes the occasionally failing test
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
into 1000 identical tests, using `test_case::test_matrix`.

When run in series, they all pass, but when run in parallel with
`cargo nextest`, some failures always occur, at least when this is
done on macOS. This is as predicted in:
GitoxideLabs#2006 (comment)

A more specific result in local testing on macOS 15 is that, while
the number of failures is 0 with `--test-threads=1`, with 2 or more
threads, the number of failures tends to *decrease* with the number
of threads. The most failures occur with `--test-threads=2`, about
30 on the system tested, while `--test-threads=16` has about 10
failures.

The tests were run with this command, with any `--test-threads=<N>`
argument added:

    cargo nextest run -p gix-worktree-state-tests writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed --no-fail-fast
To see if this can sometimes reproduce GitoxideLabs#2006 even outside macOS.

The `fail-fast: false` added here can be removed after testing.
This (from #36) adds a `test-ext4-casefold` matrix job
definition to `ci.yml`, which is intended probably to be temporary,
to demonstrate for GitoxideLabs#2008 that GitoxideLabs#2006 affects GNU/Linux (in addition
to macOS and Windows), when the GNU/Linux system is using an ext4
volume with a case-insensitive directory tree, i.e., where the ext4
filesystem is created with `-O casefold` and operations are
performed in a directory within that filesystem where `chattr +F`
has been used to enable case-folding path equivalence.

The bug is nondeterministic and challenging to produce on
GNU/Linux, even when the case-folding precondition is satisfied (as
all the `test-ext4-casefold` jobs achieve). It is much harder to
produce on GNU/Linux than on macOS or Windows. Accordingly, the
number of duplicate test cases is increased 25-fold from 1000 to
25000 in those tests, and multiple jobs are created, some of them
equivalent.

As expected, the failures do not occur when the runner does not run
tests in parallel, and they do occur most of the time when the
runner runs tests in parallel with a maximum number of parallel
tests set to 2 or to 4. However, unexpectedly, the failures almost
never occur when the test are run in parallel with a maximum number
of parallel tests set to 3. It is unclear to what extent, if any,
this generalizes, but I have not observed these patterns when
testing locally, including when modifying the local procedure to be
more similar to the behavior here (such as by suppressing reporting
of non-failing tests and suppressing the progress bar). In local
testing, I am unable to produce failures on some GNU/Linux systems,
but on those on which I am able to produce them, they are easier to
produce with all values of the `--test-threads` operand, including
3.

Not all operating systems, versions, and kernel builds and options
support case-folding on ext4. But support is fairly common on
Linux-based systems, and the CI runners have no problem with that.
In contrast, although case-folding tmpfs also exists on Linux, it
is newer and not (yet) supported as widely, and the GitHub-hosted
runners do not support case-folding tmpfs. It is possible to mount
a tmpfs filesystem and create an ext4 image in it so that, in
principle, fewer operations need to access the disk. That is one of
the approaches that was tried, but it does not appear to make it
any easier to surface these failures.

This squashes multiple commits. (Mistakes, and less illuminating
experiments, have already been omitted; the commits being squashes
here are those that are expected to be of possible interest, but
whose details are almost but not quite important enough to justify
the cumbersome effect of having them individually in the history.)
The squashed commits' individual messages are shown below.

For full changes and CI results from each commit, see:
#36

---

* Add a case-folding GNU/Linux CI test job

This can likely be removed later. But it might be kept if it finds
anything interesting, especially if it helps test a bugfix. In that
case, it may make sense to have it run only tests that are of
special interest in connection with case-sensitivity.

* Case-fold `$TMPDIR` as well in CI casefold test

* Run only 2 parallel test processes in test-ext4-casefold

This seemed to make the bug *easier* to reproduce (compared to
larger values) locally, so maybe it will surface it on CI even in
GNU/Linux.

* Increase symlink experiment reps 10x in test-ext4-casefold

From 1000 (0..=999) to 10000 (0..=9999).

* Try even harder to reproduce the failure on GNU/Linux

By repeating `cargo nextext` on the affected test group, up to 20
times.

* Add `ubuntu-24.04-arm` to `test-ext4-casefold` job

In case the failure might somehow be easier to reproduce there.

* Use tmpfs casefold instead of ext4 casefold

* Use tmpfs-backed ext4 as a workaround

For when tmpfs does not suppport `-o casefold` in mounting.

This does require that ext4 support `-O casefold`, but that was
already working before.

Even though that worked before, this could fail when using an ext4
image on tmpfs, because the ext4 image is smaller than it was when
tmpfs was not used, in order to allow it to be created in memory.
While this allows it to be created, the tests may fail if build
artifacts and other files need more space than is available.

* Run only the test(s) of interest

In `test-linux-casefold`, this runs only the tests that are
duplicates of each other, where the failure has been observed,
rather than first running the whole test suite.

The reason is to try to build fewer crates and test executables,
since this is currenntly failing because it runs out of space on
the ext4 image whose size is itself constrained by the amount of
available memory for the tmpfs filesystem on which the image is
created.

* Don't attempt to use case-folding tmpfs

But check if it is reported as available and, if so, issue a GHA
warning that it should be used instead of the current way.

* Try to fail at least as often, with less overhead

This combines a few changes:

- Only report the status of tests that actually fail.
- Run the `cargo nextest` run command once, not 20 times.
- Multiply the number of test cases by another factor of 10.
- Stop after the first failing case.

* Move the ext4 image off tmpfs

This tries going back to not using tmpfs for I/O speedup, with the
idea that the increased complexity and decreased flexibility might
be possible to avoid now that other adjustments are made to surface
the failure more reliably.

* Try to fail more often, with less delay

The `test-ext4-casefold` jobs were still not surfacing the error
quite reliably enough. The wait was also longer than ideal. This
commit makes some changes that hope to improve on both.

- Decrease the number of duplicate test cases by a factor of 4.
  This has the effect that they are increased by 25x compared to
  the multiplier in the commited code, instead of 100x.

- Create 10 times as many CI jobs in the matrix, by adding a `num`
  variable whose values are those in the range 0..=9. This variable
  is not actually used, it just serves to produce more CI jobs.

- Don't use `Swatinem/rust-cache` in these jobs anymore. It is not
  obvious that it will make things work better overall, in terms of
  speedup, cache usage, and cache access, now that we are
  increasing the number of matrix jobs by a factor of 10.

Moving some of the repetition into separate jobs, which may run in
parallel, is not only, nor even primarily, to leverage runner
parallelism to do more test runs faster. Instead, it is because it
looks like some chaotic initial state may contribute to how likely
the failure is to occur. No definitive experiment on CI shows this
to be the case -- but in local testing, on some systems, I often
have runs that fail early over and over again, and then wait a
while, come back, and have many runs that don't surface the bug,
then come back later and it is easy to produce again, and so on.

* Further diversify runners

* Undiversify runners but vary test parallelism

Since varying `runs-on` didn't surface more errors, nor help smooth
out inconsistencies across runs.

The `*-arm` runners seem to have fewer errors with the particular
way the tests are running now, so this just uses `ubuntu-latest`.

This also adjusts `rep`, having it take on more values, so as to
more than make up for what would otherwise be a smaller number of
jobs.

* Test more values for test parallelism

This creates additional CI `test-ext4-casefold` jobs, for more
values of the `--test-threads` operand passed to `cargo nextest`.

The motivation is that the value of 3 curiously seems never to
surface the failure on CI. Unlike 1, 3 should is expected to
surface the failure comparably to 2 and 4, both of which do fail
more often than not. Yet the jobs with `--test-threads=3` keep
passing.

(This is unlikely to be due to case-sensitivity misdetection, since
the assertions for when the filesystem is case-sensitive would also
fail, if they fire in a job where the filesystem is case-folding.)

If these jobs are kept, then `rep` should probably be adjusted to
have fewer, perhaps 7, values. However, this adjusts `rep` to have
16 values (i.e. twice as many as before), since the higher operands
to `--test-threads` are expected to produce the failure less often.

* Test fewer test parallelism values and reps

- Test `--test-threads` operands of 1 to 4, rather than 1 to 7.
- Do 7 reps instead of 16.

In addition to the 1000x repetition in `checkout.rs` of the
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
test case -- which almost always surfaces GitoxideLabs#2006 on CI on macOS and
Windows without further modifications -- the `test-ext4-casefold`
jobs as they currenly stand seem like they would ber sufficient to
test whether a possible future solution would fix the bug on
GNU/Linux.

This looks like about the minmum amount we should reasonably verify
still fails (in some jobs) immediately before a GitoxideLabs#2006 fix, while no
longer failing *any* jobs afterwards. But the `test-ext4-casefold`
jobs are still much more than would be wanted on the main branch.

Assuming the cause of the failures is the same on all systems, and
the solution neither explicitly nor implicitly operates differently
across systems (when comparing how it works when the filesystem is
case-insensitive, that is), it may be enough on CI to regression
test only macOS and Windows for GitoxideLabs#2006.

Thus, this also adds a TODO comment atop the `test-ext4-casefold`
definition reminding that it should not be kept, or not in full.
This change (from #37), which is intended to be
temporary, sets `thread_limit: Some(1)` in the `opts_from_probe()`
test helper function in the `checkout.rs` test module, so that
tests including
`writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed`
will use single-threaded checkout. This applies no matter how the
tests are run.

In addition, for the `test-ext4-casefold` CI jobs (but not others),
this adds the `--no-default-features` option to the `cargo nextest`
command, so that the `gix-features-parallel` feature defined in
`gix-worktree-state/Cargo.toml`, which when enabled causes
`gix-features/parallel` to be enabled for the tests, is not
enabled.

The purpose of these changes it to examine if, when checkout
operations are themselves performed without multithreading, there
is any change to GitoxideLabs#2006.

This does not make the failures go away. They still occur on all
systems on which they had occurred before. It is unclear if it
makes any difference to the failures. It seems like they may be
slightly less likely to occur on GNU/Linux, but experiments so far
are insufficient to confirm this. Furthermore, even if there is an
effect, it may be that the effect is more subtle (such as possibly
shifting the number of parallel tests where failures peak, from 2
to some higher number).

The changes here should not be confused with the duplication of the
test case, nor with the argument to `--test-threads`, which
actually specifies the number of parallel test *processes*, since
`cargo nextest` is being used.

This squashes a few commits trying some minor variations. The
original messages are shown below. For full changes and CI results
from each commit, see:
#37

---

* Try to avoid parallel checkout in `test-ext4-casefold`

This attempts to turn off parallelism within individual runs of
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
by passing `--no-default-features` to turn off the
`gix-features-parallel` feature in `gix-worktree-state/Cargo.toml`
(which is how `gix-features/parallel` is enabled in those tests).

This is to check if the multithreading parallelism within the
checkout operations participates in GitoxideLabs#2006. See:
GitoxideLabs#2006 (comment)

* Patch the `thread_limit` argument to `Some(1)` too

In the `text-ext4-casefold` jobs.

* Modify the test code to use single-threaded checkout

This moves setting `thread_limit: Some(1)` out of being a step done
with `sed` in the `test-ext4-casefold` jobs, and into being an
actual change to the source code of the `checkout.rs` test module.

This change is intended to be temporary. The goal is the same as
before, but to observe the effect outside `text-ext4-casefold`.

That is, the goal here is to temporarily see if there is a change
in results in the other jobs where failure almost always occurs
due to GitoxideLabs#2006, i.e., the `test-fast` jobs on `macos-latest` and
`windows-latest`, and the `test-fixtures-windows` job.

Note that, as of this change, the `gix-features/parallel` feature
is only turned off in the `test-ext4-casefold` jobs. In others,
that feature is still turned on, just a parallelism of 1, i.e., a
single thread, is used for the checkout. But a parallelism of 1
seems usually to be special-cased to not use facilities related to
multithreading.

(This `thread_limit` argument change, as well as the immediately
preceding change of disabling the `parallel` feature flag in the
`test-ext4-casefold` test jobs, are entirely separate from how many
writes_through_symlinks_are_prevented_even_if_overwriting_is_allowed
duplicated test cases there are, and also entirely separate from
the operand to `--test-threads`, which actually controls the number
of test *processes* `cargo nextest` creates to run the tests.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants