bootstrap: retry `cargo` invocations if stderr contains a known pattern

## Why

Our CI auto builds sometimes fail for known reasons that are not related to the PRs we are trying to merge.

Most of the times, these errors are hard to understand and fix (or can't be fixed at all), decreasing the success rate of the auto builds for several weeks or months.
The impact is that we loose days of parallel compute time and hours of maintainers time that need to analyze the error message and reschedule the PRs in the merge queue.

## Feature

We want to list the the stderr of the known issues we are aware of in the `config.toml` file that bootstrap uses. These patterns can be expressed as regex.
We want `bootstrap` to retry cargo invocations up to two times if stderr matches one of the listed patterns.

This would help reduce the failure rate of our CI because it would significantly reduce the percentage of jobs failing due to spurious errors.

The error messages need to be precise enough to avoid retrying cargo invocations over genuine problems.

Known error patterns can be found [here](https://github.com/rust-lang/rust/issues/133959). Not all of them can be listed.

As a start, we could just have 1 stderr string in the list (this one doesn't need to be a regex): 

- `ranlib.exe: could not create temporary file whilst writing archive: no more archived files` which is discussed in https://github.com/rust-lang/rust/issues/108227

## Questions

- Is `config.toml` the right place to put the known stderr patterns? In Zulip, Jieyou proposed introducing another file: `retry-patterns.toml`. I'll leave it to the bootstrap team to decide.
- Which format do we use to write the stderr patterns in the `config.toml` file? For example, it can be an array of strings. It could also be an "object" if we want to customize how many times to retry per error message. I'll leave it to the boostrap team to decide.
- how do we make sure these patterns are present in the `config.toml` used for CI? I'm not familiar with how the `config.toml` for the CI is generated.

## Zulip links

- idea proposed [here](https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra/topic/CI.20improvements/near/489445485)
- agreement reached [here](https://rust-lang.zulipchat.com/#narrow/channel/242791-t-infra/topic/CI.20improvements/near/489749580)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bootstrap: retry `cargo` invocations if stderr contains a known pattern #134472

Why

Feature

Questions

Zulip links

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

bootstrap: retry cargo invocations if stderr contains a known pattern #134472

Description

Why

Feature

Questions

Zulip links

Activity

jieyouxu commented on Dec 18, 2024

jieyouxu commented on Dec 18, 2024

marcoieni commented on Dec 18, 2024

jieyouxu commented on Dec 18, 2024

Kobzol commented on Dec 18, 2024

onur-ozkan commented on Dec 18, 2024

clubby789 commented on Dec 18, 2024

onur-ozkan commented on Dec 19, 2024

marcoieni commented on Dec 19, 2024

jieyouxu commented on Dec 19, 2024

marcoieni commented on Dec 19, 2024

Kobzol commented on Dec 19, 2024

jieyouxu commented on Dec 19, 2024

marcoieni commented on Dec 19, 2024

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions

bootstrap: retry `cargo` invocations if stderr contains a known pattern #134472