-
Notifications
You must be signed in to change notification settings - Fork 558
Add Debug Info section #2649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Walnut356
wants to merge
3
commits into
rust-lang:main
Choose a base branch
from
Walnut356:debuginfo
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,337
−1
Open
Add Debug Info section #2649
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # Debugger Internals | ||
|
|
||
| It is the debugger's job to convert the debug info into an in-memory representation. Both the | ||
| interpretation of the debug info and the in-memory representation are arbitrary; anything will do | ||
| so long as meaningful information can be reconstructed while the program is running. The pipeline | ||
| from raw debug info to usable types can be quite complicated. | ||
|
|
||
| Once the information is in a workable format, the debugger front-end then must provide a way to | ||
| interpret and display the data, a way for users to interact with it, and an API for extensibility. | ||
|
|
||
| Debuggers are vast systems and cannot be covered completely here. This section will provide a brief | ||
| overview of the subsystems directly relevant to the Rust debugging experience. | ||
|
|
||
| Microsoft's debugging engine is closed source, so it will not be covered here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # Debugger Visualizers | ||
|
|
||
| These are typically the last step before the debugger displays the information, but the results may | ||
| be piped through a debug adapter such as an IDE's debugger API. | ||
|
|
||
| The term "Visualizer" is a bit of a misnomer. The real goal isn't just to prettify the output, but | ||
| to provide an interface for the user to interact with that is as useful as possible. In many cases | ||
| this means reconstructing the original type as closely as possible to its Rust representation, but | ||
| not always. | ||
|
|
||
| The visualizer interface allows generating "synthetic children" - fields that don't exist in the | ||
| debug info, but can be derived from invariants about the language and the type itself. A simple | ||
| example is allowing one to interact with the elements of a `Vec<T>` instead of just it's `*mut u8` | ||
| heap pointer, length, and capacity. | ||
|
|
||
| ## `rust-lldb`, `rust-gdb`, and `rust-windbg.cmd` | ||
|
|
||
| These support scripts are distributed with Rust toolchains. They locate the appropriate debugger and | ||
| the toolchain's visualizer scripts, then launch the debugger with the appropriate arguments to load | ||
| the visualizer scripts before a debugee is launched/attached to. | ||
|
|
||
| ## `#![debugger_visualizer]` | ||
|
|
||
| [This attribute][dbg_vis_attr] allows Rust library authors to include pretty printers for their | ||
| types within the library itself. These pretty printers are of the same format as typical | ||
| visualizers, but are embedded directly into the compiled binary. These scripts are loaded | ||
| automatically by the debugger, allowing a seamless experience for users. This attribute currently | ||
| works for GDB and natvis scripts. | ||
|
|
||
| [dbg_vis_attr]: https://doc.rust-lang.org/reference/attributes/debugger.html#the-debugger_visualizer-attribute | ||
|
|
||
| GDB python scripts are embedded in the `.debug_gdb_scripts` section of the binary. More information | ||
| can be found [here](https://sourceware.org/gdb/current/onlinedocs/gdb.html/dotdebug_005fgdb_005fscripts-section.html). Rustc accomplishes this in [`rustc_codegen_llvm/src/debuginfo/gdb.rs`][gdb_rs] | ||
|
|
||
| [gdb_rs]: https://github.com/rust-lang/rust/blob/main/compiler/rustc_codegen_llvm/src/debuginfo/gdb.rs | ||
|
|
||
| Natvis files can be embedded in the PDB debug info using the [`/NATVIS` linker option][linker_opt], | ||
| and have the [highest priority][priority] when a type is resolving which visualizer to use. The | ||
| files specified by the attribute are collected into | ||
| [`CrateInfo::natvis_debugger_visualizers`][natvis] which are then added as linker arguments in | ||
| [`rustc_codegen_ssa/src/back/linker.rs`][linker_rs] | ||
|
|
||
| [linker_opt]: https://learn.microsoft.com/en-us/cpp/build/reference/natvis-add-natvis-to-pdb?view=msvc-170 | ||
| [priority]: https://learn.microsoft.com/en-us/visualstudio/debugger/create-custom-views-of-native-objects?view=visualstudio#BKMK_natvis_location | ||
| [natvis]: https://github.com/rust-lang/rust/blob/e0e204f3e97ad5f79524b9c259dc38df606ed82c/compiler/rustc_codegen_ssa/src/lib.rs#L212 | ||
| [linker_rs]: https://github.com/rust-lang/rust/blob/main/compiler/rustc_codegen_ssa/src/back/linker.rs#L1106 | ||
|
|
||
| LLDB is not currently supported, but there are a few methods that could potentially allow support in | ||
| the future. Officially, the intended method is via a [formatter bytecode][bytecode]. This was | ||
| created to offer a comparable experience to GDB's, but without the safety concerns associated with | ||
| embedding an entire python script. The opcodes are limited, but it works with `SBValue` and `SBType` | ||
| in roughly the same way as python visualizer scripts. Implementing this would require writing some | ||
| sort of DSL/mini compiler. | ||
|
|
||
| [bytecode]: https://lldb.llvm.org/resources/formatterbytecode.html | ||
|
|
||
| Alternatively, it might be possible to copy GDB's strategy entirely: create a bespoke section in the | ||
| binary and embed a python script in it. LLDB will not load it automatically, but the python API does | ||
| allow one to access the [raw sections of the debug info][SBSection]. With this, it may be possible | ||
| to extract the python script from our bespoke section and then load it in during the startup of | ||
| Rust's visualizer scripts. | ||
|
|
||
| [SBSection]: https://lldb.llvm.org/python_api/lldb.SBSection.html#sbsection | ||
|
|
||
| ## Performance | ||
|
|
||
| Before tackling the visualizers themselves, it's important to note that these are part of a | ||
| performance-sensitive system. Please excuse the break in formality, but: if I have to spend | ||
| significant time debugging, I'm annoyed. If I have to *wait on my debugger*, I'm pissed. | ||
|
|
||
| Every millisecond spent in these visualizers is a millisecond longer for the user to see output. | ||
| This can be especially painful for large stackframes that contain many/large container types. | ||
| Debugger GUI's such as VSCode will request the whole stack frame at once, and this can result in | ||
| delays of tens of seconds (or even minutes) before being able to interact with any variables in the | ||
| frame. | ||
|
|
||
| There is a tendancy to balk at the idea of optimizing Python code, but it really can have a | ||
| substantial impact. Remember, there is no compiler to help keep the code fast. Even simple | ||
| transformations are not done for you. It can be difficult to find Python performance tips through | ||
| all the noise of people suggesting you don't bother optimizing Python, so here are some things to | ||
| keep in mind that are relevant to these scripts: | ||
|
|
||
| * Everything allocates, even `int` | ||
| * Use tuples when possible. `list` is effectively `Vec<Box<[Any]>>`, whereas tuples are equivalent | ||
| to `Box<[Any]>`. They have one less layer of indirection, don't carry extra capacity and can't | ||
| grow/shrink which can be advantageous in many cases. An additional benefit is that Python caches and | ||
| recycles the underlying allocations of all tuples up to size 20. | ||
| * Regexes are slow and should be avoided when simple string manipulation will do | ||
| * Strings are immutable, thus many string operations implictly copy the contents. | ||
| * When concatenating large lists of strings, `"".join(iterable_of_strings)` is typically the fastest | ||
| way to do it. | ||
| * f-strings are generally the fastest way to do small, simple string transformations such as | ||
| surrounding a string with parentheses. | ||
| * The act of calling a function is somewhat slow (even if the function is completely empty). If the | ||
| code section is very hot, consider inlining the function manually. | ||
| * Local variable access is significantly faster than global and built-in function access | ||
| * Member/method access via the `.` operator is also slow, consider reassigning deeply nested values | ||
| to local variables to avoid this cost (e.g. `h = a.b.c.d.e.f.g.h`). | ||
| * Accessing inherited methods and fields is about 2x slower than base-class methods and fields. | ||
| Avoid inheritance whenever possible. | ||
| * Use [`__slots__`](https://wiki.python.org/moin/UsingSlots) wherever possible. `__slots__` is a way | ||
| to indicate to Python that your class's fields won't change and speeds up field access by a | ||
| noticable amount. This does require you to name your fields in advance and initialize them in | ||
| `__init__`, but it's a small price to pay for the benefits. | ||
| * Match statements/if..elif..else are not optimized in any way. The conditions are checked in order, | ||
| 1 by 1. If possible, use an alternative such as dictionary dispatch or a table of values | ||
| * Compute lazily when possible | ||
| * List comprehensions are typically faster than loops, generator comprehensions are a bit slower | ||
| than list comprehensions, but use less memory. You can think of comprehensions as equivalent to | ||
| Rust's `iter.map()`. List comprehensions effectively call `collect::<Vec<_>>` at the end, whereas | ||
| generator comprehensions do not. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| # (WIP) GDB Internals | ||
|
|
||
| GDB's Rust support lives at `gdb/rust-lang.h` and `gdb/rust-lang.c`. The expression parsing support | ||
| can be found in `gdb/rust-exp.h` and `gdb/rust-parse.c` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # (WIP) GDB - Python Providers | ||
|
|
||
| Below are links to relevant parts of the GDB documentation | ||
|
|
||
| * [Overview on writing a pretty printer](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Writing-a-Pretty_002dPrinter.html#Writing-a-Pretty_002dPrinter) | ||
| * [Pretty Printer API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Pretty-Printing-API.html#Pretty-Printing-API) (equivalent to LLDB's `SyntheticProvider`) | ||
| * [Value API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Values-From-Inferior.html#Values-From-Inferior) (equivalent to LLDB's `SBValue`) | ||
| * [Type API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Types-In-Python.html#Types-In-Python) (equivalent to LLDB's `SBType`) | ||
| * [Type Printing API](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Type-Printing-API.html#Type-Printing-API) (equivalent to LLDB's `SyntheticProvider.get_type_name`) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| # Debug Info | ||
|
|
||
| Debug info is a collection of information generated by the compiler that allows debuggers to | ||
| correctly interpret the state of a program while it is running. That includes things like mapping | ||
| instruction addresses to lines of code in the source file, and type layout information so that | ||
| bytes in memory can be read and displayed in a meaningful way. | ||
|
|
||
| Debug info can be a slightly overloaded term, covering all the layers between Rust MIR, and the | ||
| end-user seeing the output of their debugger onscreen. In brief, the stack from beginning to end is | ||
| as follows: | ||
|
|
||
| 1. Rustc inspects the MIR and communicates the relevant source, symbol, and type information to LLVM | ||
| 2. LLVM translates this information into a target-specific debug info format during compilation | ||
| 3. A debugger reads and interprets the debug info, mapping source-lines and allowing the debugee's | ||
| variables in memory to be located and read with the correct layout | ||
| 4. Built-in debugger formatting and styling is applied to variables | ||
| 5. User-defined scripts are run, formatting and styling the variables further | ||
| 6. The debugger frontend displays the variable to the user, possibly through the means of additional | ||
| API layers (e.g. VSCode extension by way of the | ||
| [Debug Adapter Protocol](https://microsoft.github.io/debug-adapter-protocol/)) | ||
|
|
||
|
|
||
| > NOTE: This subsection of the dev guide is perhaps more detailed than necessary. It aims to collect | ||
| > a large amount of scattered information into one place and equip the reader with as firm a grasp of | ||
| > the entire debug stack as possible. | ||
| > | ||
| > If you are only interested in working on the visualizer | ||
| > scripts, the information in the [debugger-visualizers](./debugger-visualizers.md) and | ||
| > [testing](./testing.md) will suffice. If you need to make changes to Rust's debug node generation, | ||
| > please see [rust-codegen](./rust-codegen.md). All other sections are supplementary, but can be | ||
| > vital to understanding some of the compromises the visualizers or codegen need to make. It can | ||
| > also be valuable to know when a problem might be better solved in LLVM or the debugger itself. | ||
| # DWARF | ||
|
|
||
| The is the primary debug info format for `*-gnu` targets. It is typically bundled in with the | ||
| binary, but it [can be generated as a separate file](https://gcc.gnu.org/wiki/DebugFission). The | ||
| DWARF standard is available [here](https://dwarfstd.org/). | ||
|
|
||
| > NOTE: To inspect DWARF debug info, [gimli](https://crates.io/crates/gimli) can be used | ||
| > programatically. If you prefer a GUI, the author recommends [DWEX](https://github.com/sevaa/dwex) | ||
| # PDB/CodeView | ||
|
|
||
| The primary debug info format for `*-msvc` targets. PDB is a proprietary container format created by | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dunno if worth mentioning but PDB can be also created using for |
||
| Microsoft that, unfortunately, | ||
| [has multiple meanings](https://docs.rs/ms-pdb/0.1.10/ms_pdb/taster/enum.Flavor.html). | ||
| We are concerned with ordinary PDB files, as Portable PDB is used mainly for .Net applications. PDB | ||
| files are separate from the compiled binary and use the `.pdb` extension. | ||
|
|
||
| PDB files contain CodeView objects, equivalent to DWARF's tags. CodeView, the debugger that | ||
| consumed CodeView objects, was originally released in 1985. Its original intent was for C debugging, | ||
| and was later extended to support Visual C++. There are still minor alterations to the format to | ||
| support modern architectures and languages, but many of these changes are undocumented and/or | ||
| sparsely used. | ||
|
|
||
| It is important to keep this context in mind when working with CodeView objects. Due to its origins, | ||
| the "feature-set" of these objects is very limited, and focused around the core features of C. It | ||
| does not have many of the convenience or features of modern DWARF standards. A fair number of | ||
| workarounds exist within the debug info stack to compensate for CodeView's shortcomings. | ||
|
|
||
| Due to its proprietary nature, it is very difficult to find information about PDB and CodeView. Many | ||
| of the sources were made at vastly different times and contain incomplete or somewhat contradictory | ||
| information. As such this page will aim to collect as many sources as possible. | ||
|
|
||
| * [CodeView 1.0 specification](./CodeView.pdf) | ||
| * LLVM | ||
| * [CodeView Overview](https://llvm.org/docs/SourceLevelDebugging.html#codeview-debug-info-format) | ||
| * [PDB Overview and technical details](https://llvm.org/docs/PDB/index.html) | ||
| * Microsoft | ||
| * [microsoft-pdb](https://github.com/microsoft/microsoft-pdb) - A C/C++ implementation of a PDB | ||
| reader. The implementation does not contain the full PDB or CodeView specification, but does | ||
| contain enough information for other PDB consumers to be written. At time of writing (Nov 2025), | ||
| this repo has been archived for several years. | ||
| * [pdb-rs](https://github.com/microsoft/pdb-rs/) - A Rust-based PDB reader and writer based on | ||
| other publicly-available information. Does not guarantee stability or spec compliance. Also | ||
| contains `pdbtool`, which can dump PDB files (`cargo install pdbtool`) | ||
| * [Debug Interface Access SDK](https://learn.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/getting-started-debug-interface-access-sdk). | ||
| While it does not document the PDB format directly, details can be gleaned from the interface | ||
| itself. | ||
|
|
||
| # Debuggers | ||
|
|
||
| Rust supports 3 major debuggers: GDB, LLDB, and CDB. Each has its own set of requirements, | ||
| limitations, and quirks. This unfortunately creates a large surface area to account for. | ||
|
|
||
| > NOTE: CDB is a proprietary debugger created by Microsoft. The underlying engine also powers | ||
| >WinDbg, KD, the Microsoft C/C++ extension for VSCode, and part of the Visual Studio Debugger. In | ||
| >these docs, it will be referred to as CDB for consistency | ||
| While GDB and LLDB do offer facilities to natively support Rust's value layout, this isn't | ||
| completely necessary. Rust currently outputs debug info very similar to that of C++, allowing | ||
| debuggers without Rust support to work with a slightly degraded experience. More detail will be | ||
| included in later sections, but here is a quick reference for the capabilities of each debugger: | ||
|
|
||
| | Debugger | Debug Info Format | Native Rust support | Expression Style | Visualizer Scripts | | ||
| | --- | --- | --- | --- | --- | | ||
| | GDB | DWARF | Full | Rust | Python | | ||
| | LLDB | DWARF and PDB | Partial | C/C++ | Python | | ||
| | CDB | PDB | None | C/C++ | Natvis | | ||
|
|
||
| > IMPORTANT: CDB can be assumed to run only on Windows. No assumptions can be made about the OS | ||
| >running GDB or LLDB. | ||
| ## Unsupported | ||
|
|
||
| Below, are several unsupported debuggers that are of particular note due to their potential impact | ||
| in the future. | ||
|
|
||
| * [Bugstalker](https://github.com/godzie44/BugStalker) is an x86-64 Linux debugger written in Rust, | ||
| specifically to debug Rust programs. While promising, it is still in early development. | ||
| * [RAD Debugger](https://github.com/EpicGamesExt/raddebugger) is a Windows-only GUI debugger. It has | ||
| a custom debug info format that PDB is translated into. The project also includes a linker that can | ||
| generate their new debug info format during the linking phase. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: I think I'd be interested in a high-level section here about how
rust-lldbis configured with the visualizers, as well as a brief overview of how the#![debugger_visualizer]attribute works.