Skip to content

Add tracing_chrome under "tracing" feature #4406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

Stypox
Copy link
Contributor

@Stypox Stypox commented Jun 18, 2025

  • Added a "tracing" feature that enables Chrome traces, and sets Machine::TRACING_ENABLED to true
  • I ended up adding the tracing_chrome crate by copy-pasting this ~600 line file.
    • As discussed previously, depending on the tracing-chrome crate from crates.io is unfortunately not possible since it depends on tracing_core which conflicts with rustc_private's tracing_core (meaning it would not be possible to use the ChromeLayer in a context that expects a Layer from from rustc_private's tracing_core version).
    • I tried to use cargo's [patch] and [replace] sections, but although they would work for normal libraries, they don't seem to behave as expected when the crate to replace comes from rustc_private, see this Zulip comment for a list of experiments
    • Also see more dicussion in this Zulip thread
    • I am open to trying other stuff to avoid copy-pasting foreign code into Miri, but I didn't want to waste more time on this right now, as it's blocking any work on tracing.
    • Do I need to mention the author and license at the top of the copied file, or is a link (like I've done now) enough for attribution?
  • I moved the logger/tracing setup functions out of miri.rs so the file is less cluttered.
    • Should I put those new files under trace/ (like done now) or should I rather move them under bin/, since only bin/miri.rs uses those functions?
  • The ChromeLayer internally starts a thread to write data to file, and thus relies on a guard to properly flush the file and terminate the thread when dropped. Since std::process::exit() was being called in a few places, I had to restructure the code a bit to avoid exiting directly (which wouldn't call drop() on the guard).
    • As far as I understand rustc does not call exit(), but rather raises an unwinding panic which is then caught by rustc_driver::catch_fatal_errors(). After modifying run_compiler_then_exit to not exit directly, I could confirm that the tracing file was being flushed correctly in case of a compiler error by running RUSTC_LOG=1 ./miri run --features tracing FILE_WITH_SYNTAX_ERRORS.
    • I moved the call to init_early_loggers() after argument parsing, is this ok? I did not see any log/trace call during argument parsing anyway. This avoids having to refactor show_error!() to not exit() directly

@Stypox Stypox force-pushed the tracing branch 2 times, most recently from 16e2ce6 to d432b6b Compare June 19, 2025 10:31
@Stypox Stypox marked this pull request as ready for review June 19, 2025 10:32
@Stypox
Copy link
Contributor Author

Stypox commented Jun 19, 2025

I am not sure why tests fail, when I tried locally they failed even on the latest commit on master ( 2f4f9ac ). I did a git bisect and it told me that either ab135f0 or 42f66f4 are the first bad commit (the first one doesn't build, so I skipped it, while the second is the first where ./miri test failed). However this does not explain why the CI on 2f4f9ac passed.

@tiif
Copy link
Member

tiif commented Jun 19, 2025

I am on x86_64-unknown-linux-gnu, ./miri test for current master branch (2f4f9ac) passed locally for me.

May I know what os are you currently using? And is the failure the same as the one in the CI?

@Stypox
Copy link
Contributor Author

Stypox commented Jun 19, 2025

@tiif On my PC the tests that fail run into errors like "error: unsupported operation: extern static __rust_no_alloc_shim_is_unstable is not supported by Miri". Even just running ./miri run tests/pass/hello.rs triggerrs the same error. But the CI exits with code 143 without printing any error (I'm looking at ubuntu-latest):

2025-06-19T10:55:01.6057618Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
2025-06-19T10:55:01.6062974Z ##[error]Process completed with exit code 143.
2025-06-19T10:55:01.6983400Z Cleaning up orphan processes

I am also on x86_64-unknown-linux-gnu, my OS is Manjaro Linux with kernel 5.15.182-1-MANJARO, I have a AMD Ryzen 7 5800H cpu, let me know if you need anything else.

@tiif
Copy link
Member

tiif commented Jun 19, 2025

I suspect the CI failure might be triggered by the change in this PR, as there were successful CI runs after this one in https://github.com/rust-lang/miri/actions. (I haven't read the code closely, so this is just a guess :)

About the unsupported error, I think maybe the toolchain is outdated. You can try to follow the steps here https://github.com/rust-lang/miri/blob/master/CONTRIBUTING.md#preparing-the-build-environment to update it if you haven't done so recently.

If the problem persists, feel free to ask for help in zulip.

Stypox added 3 commits June 25, 2025 11:29
The file was taken unmodified from the following link, except for file attributes at the top: https://github.com/thoren-d/tracing-chrome/blob/7e2625ab4aeeef2f0ef9bde9d6258dd181c04472/src/lib.rs
Depending on the tracing-chrome crate from crates.io is unfortunately not possible since it depends on `tracing_core` which conflicts with rustc_private's `tracing_core` (meaning it would not be possible to use the [ChromeLayer] in a context that expects a [Layer] from from rustc_private's `tracing_core` version)
The tracing_chrome Layer has a connected Guard that needs to be dropped before exiting the process, or else data may not be flushed to the tracing file. This required making a few changes to main() to ensure std::process::exit() is not called without making sure to first drop the guard.
@Stypox
Copy link
Contributor Author

Stypox commented Jun 25, 2025

The tests pass locally. I rebased on master and force pushed to make the CI rerun, but it still fails.

Both now and earlier, I think the cause for the CI failure was "memory allocation of 408 bytes failed" on windows-latest (see here and here). mac also fails due to "No space left on device (os error 28)" (see here). I don't understand why ubuntu fails with code 143 without printing any error.

I don't understand how changes in this PR can cause the tests to use more memory (assuming that's the culprit). The behavior of the code in this PR should be the same as before when no --tracing is passed.

Copy link
Contributor Author

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build was failing because the ci/ci.sh script passed CARGO_EXTRA_FLAGS=--all-features and thus enabled the tracing feature, which caused every test to collect and save tracing information to file which filled up the worker filesystem leading to the errors above. Now I changed ci/ci.sh to only include the features relevant for tests, i.e. all but "tracing", i.e. genmc, stack-cache, stack-cache-consistency-check.

src/bin/miri.rs Outdated
@@ -211,6 +219,10 @@ impl rustc_driver::Callbacks for MiriCompilerCalls {
if return_code != rustc_driver::EXIT_SUCCESS {
eprintln!("FAILING SEED: {seed}");
if !many_seeds.keep_going {
// drop the tracing guard before exiting, so tracing calls are flushed correctly
if let Ok(mut lock) = tracing_guard.try_lock() {
let _guard_being_dropped = (*lock).take();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An explicit call to drop seems more clear here.

But it's not great tha we apparently have to remember to do this in multiple places...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I also don't like that much, but it felt the safest option. An alternative would be to call libc::atexit() or to use a crate like https://crates.io/crates/ctor, though both options involve unsafe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an idea: move everything in fn after_analysis that is after init_late_loggers into a separate inherent method on MiriCompilerCalls that returns an ExitCode. Have the guard inside that separate method so it gets dropped when the method returns, and then there's just a single exit call in after_analysis afterwards.

Copy link
Contributor Author

@Stypox Stypox Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of that but exit() is also used inside the par_for_each_in to exit immediately as soon as anything fails (when many_seeds.keep_going is false) and afaik there is no easy and safe way to stop executing threads to then drop the guard and return

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... yeah true, par_for_each_in does not support early-return.

Well, then let's at least have a local exit closure inside this function to reduce the code duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, then let's at least have a local exit closure inside this function to reduce the code duplication.

I don't think this has been done?
With the new API, what we probably want is a fn exit that calls deinit_loggers, and then use that instead of the one from std::process.

Comment on lines 54 to 58
guard = Some(TracingGuard {
#[cfg(feature = "tracing")]
_chrome: chrome_guard,
_no_construct: (),
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this guard have toe be created here? This then causes all sorts of complexity since you can't know whether it gets created in early-init or late-init.

It seems better to just create the guard separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by creating the guard separately? An alternative would be to pass a parameter around but then we'd have to do the .is_none() check anyway so I don't see better alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean to have init_*_logger store the guard somewhere, e.g. inside LOGGER_INITED, and then have a separate function that fetches the guard from there (and panics if it has already been fetched / not yet been placed there).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOGGER_INITED should probably become a OnceLock for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard needs to be fetched from main() and also from after_analysis. Having the guard in main() is needed to do the cleanup when the compiler exits. I pushed a commit that implements your suggestion and takes this observation into consideration. While the interface is nicer, it may lead to more confusion, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we even care about the contents of the trace file when there is an error?^^

But anyway, that last version works for me, thanks.

@Stypox
Copy link
Contributor Author

Stypox commented Jun 27, 2025

  • Removed $FEATURES from ci.sh's ./miri install
  • Added docs to TracingGuard, improved docs in tracing_chrome.rs and made other comments follow the "Abc def." style (instead of "abc def")
  • impl Drop for TracingGuard
  • Renamed run_compiler_and_exit to run_compiler_return_exit_code
  • Moved tracing_chrome and logger setup into bin/trace folder. Putting them under bin/ directly meant that each file would be interpreted as a binary as far as I understand, that's why I had to put them under an additional trace/ subfolder.

@RalfJung
Copy link
Member

@rustbot author

@rustbot
Copy link
Collaborator

rustbot commented Jun 27, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@rustbot rustbot added the S-waiting-on-author Status: Waiting for the PR author to address review comments label Jun 27, 2025
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jun 27, 2025
Add tracing to `InterpCx::layout_of()`

This PR adds tracing calls to `instantiate_from_frame_and_normalize_erasing_regions` and to `InterpCx::layout_of()`. The latter is done by shadowing `LayoutOf`'s trait method with an inherent method on `InterpCx`.

<details><summary>Previous attempt by overriding the `layout_of` query (includes downloadable `.diff` patch)</summary>

This PR is meant for Miri, but requires a few changes in `rustc` code, hence why it's here. It adds tracing capabilities to the `layout_of` function in `tcx` by overriding the `layout_of` query (under `local_providers`) with a wrapper that opens a tracing span and then calls the actual `layout_of`. To make this possible, I had to make `rustc_ty_utils::layout::layout_of` public. I added an assert to ensure the `providers.layout_of` value I am replacing is actually `rustc_ty_utils::layout::layout_of`, just in case.

I also considered taking the previous value in `providers.layout_of` and calling that one instead, to avoid making `layout_of` public. But then the closure would not be castable to a function pointer anymore (`providers.layout_of` is a function pointer), because it would depend on the local variable storing the previous value of `providers.layout_of`. Using a global variable would work but would rely on `unsafe` or on `Mutex`es, so I wanted to avoid it.

Here is some tracing output when Miri is run on `src/tools/miri/tests/pass/hello.rs`, visualizable in https://ui.perfetto.dev: [trace-1750338860374637.json](https://github.com/user-attachments/files/20820392/trace-1750338860374637.json)

Another place where I could have added tracing calls is to the `rustc_middle::ty::layout::LayoutCx` struct / `spanned_layout_of()` function, however there is no simple way to disable the tracing calls with compile-time boolean constants there (since `LayoutCx::new()` is used everywhere and referenced directly), and in any case it seems like `spanned_layout_of()` just calls `tcx.layout_of()` anyway. For completeness' sake, here is tracing output for when a tracing call is added to `spanned_layout_of()`: [trace-1750340887920584.json](https://github.com/user-attachments/files/20820609/trace-1750340887920584.json)

Patch to override `layout_of` query: [tracing-layout_of-query-override.diff.txt](https://github.com/user-attachments/files/20944497/tracing-layout_of-query-override.diff.txt)

</details>

**Note: obtaining tracing output depends on rust-lang/miri#4406, but this PR is standalone and can be merged without waiting for rust-lang/miri#4406

r? `@RalfJung`
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jun 27, 2025
Add tracing to `validate_operand`

This PR adds a tracing call to keep track of how much time is spent in `validate_operand` and `const_validate_operand`. Let me know if more fine-grained tracing is needed (e.g. adding tracing to `validate_operand_internal` too, which is just called from those two functions).

I also fixed the rustdoc of `validate_operand` and `const_validate_operand` since it was referencing an older name for the `val` parameter which was renamed in cbdcbf0.

Here is some tracing output when Miri is run on `src/tools/miri/tests/pass/hello.rs`, visualizable in [ui.perfetto.dev](https://ui.perfetto.dev/): [trace-1750932222218210.json](https://github.com/user-attachments/files/20924000/trace-1750932222218210.json)

**Note: obtaining tracing output depends on rust-lang/miri#4406, but this PR is standalone and can be merged without waiting for rust-lang/miri#4406

r? `@RalfJung`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: Waiting for the PR author to address review comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants