Skip to content

DLPX-94248 address drgn merge conflict with upstream #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 248 commits into
base: develop
Choose a base branch
from

Conversation

sebroy
Copy link

@sebroy sebroy commented Jun 26, 2025

This resolves the conflict by restoring the .pre-commit-config.yaml file. This requires that we comment out the mypy check in the file, as that check doesn't pass for us due to missing dependencies in our environment.

The diff for that change is simply this:

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index c05bf5e2..86b14d7b 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -14,12 +14,12 @@ repos:
     rev: 7.1.2
     hooks:
     -   id: flake8
--   repo: https://github.com/pre-commit/mirrors-mypy
-    rev: v1.14.1
-    hooks:
-    -   id: mypy
-        args: [--show-error-codes, --strict, --no-warn-return-any]
-        files: ^(drgn/.*\.py|_drgn.pyi|_drgn_util/.*\.py|tools/.*\.py)$
+#-   repo: https://github.com/pre-commit/mirrors-mypy
+#    rev: v1.14.1
+#    hooks:
+#    -   id: mypy
+#        args: [--show-error-codes, --strict, --no-warn-return-any]
+#        files: ^(drgn/.*\.py|_drgn.pyi|_drgn_util/.*\.py|tools/.*\.py)$
 -   repo: https://github.com/pre-commit/pre-commit-hooks
     rev: v5.0.0
     hooks:

ab-pre-push: https://selfservice-jenkins.eng-tools-prd.aws.delphixcloud.com/job/appliance-build-orchestrator-pre-push/11488/

osandov and others added 30 commits December 18, 2024 16:24
Ubuntu enables -Wformat-security by default, but upstream GCC doesn't
enable it even with -Wall. It caught some legitimate issues in the
module API branch, so let's enable it explicitly.

Signed-off-by: Omar Sandoval <[email protected]>
An upcoming commit needs to look up a few notes by name+type, so add a
common helper function for that.

Signed-off-by: Omar Sandoval <[email protected]>
This will be used for checking the CRC in .gnu_debuglink.

Signed-off-by: Omar Sandoval <[email protected]>
These will be used for GNU build IDs.

Signed-off-by: Omar Sandoval <[email protected]>
We are going to need to parse various structures that have different 64-
and 32-bit formats, may be byte-swapped, and may be unaligned (e.g.,
various ELF and link map structures). Provide a couple of convenience
macros for doing this.

Signed-off-by: Omar Sandoval <[email protected]>
This is a trivial wrapper to delete by entry instead of by key or
iterator.

Signed-off-by: Omar Sandoval <[email protected]>
This has been available since Linux kernel commit 0935288c6e00 ("kdump:
append kernel build-id string to VMCOREINFO") (in v5.9). Save it so we
can use it when loading debugging information.

Unfortunately, the build ID in VMCOREINFO was briefly broken in several
stable releases. 6.10 and 5.15 reached their end-of-life while broken
and so will remain broken forever. It feels like overkill to drop
support for those versions over this, so we work around it with a
version check.

Signed-off-by: Omar Sandoval <[email protected]>
No code changes other than moving it in the file to make upcoming diffs
cleaner.

Signed-off-by: Omar Sandoval <[email protected]>
We currently use debuginfod via libdwfl, but when we get rid of our
libdwfl dependency, we'll need to do the debuginfod calls ourselves. So,
let's add the scaffolding for using libdebuginfod. We provide three
choices at build time:

* No debuginfod (./configure --without-debuginfod).
* Soft dependency: load libdebuginfod with dlopen at runtime if
  available (./configure --with-debuginfod --enable-dlopen-debuginfod).
  This is the default and probably what we want distros to use.
* Hard dependency: link against libdebuginfod (./configure
  --with-debuginfod --disable-dlopen-debuginfod). This is intended for
  environments where dlopen can't be used (e.g., manylinux wheels).

The client handle will be created lazily, so for now this just sets up
some wrappers and doesn't do much with them.

Signed-off-by: Omar Sandoval <[email protected]>
We try to pick a good default, but it's not exactly the same as logging
so it needs extra flexibility. This will be used by upcoming debuginfod
integration.

Signed-off-by: Omar Sandoval <[email protected]>
drgn currently provides limited control over how debugging information
is found. drgn has hardcoded logic for where to search for debugging
information. The most the user can do is provide a list of files for
drgn to try in addition to the default locations (with the -s CLI option
or the drgn.Program.load_debug_info() method).

The implementation is also a mess. We use libdwfl, but its data model is
slightly different from what we want, so we have to work around it or
reimplement its functionality in several places: see commits
e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn:
debug_info: report userspace core dump debug info ourselves"), and
1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for
some examples. The mismatched combination of libdwfl and our own code is
difficult to maintain, and the lack of control over the whole debug info
pipeline has made it difficult to fix several longstanding issues.

The solution is a major rework removing our libdwfl dependency and
replacing it with our own model. This (huge) commit is that rework
comprising the following components:

- drgn.Module/struct drgn_module, a representation of a binary used by a
  program.
- Automatic discovery of the modules loaded in a program.
- Interfaces for manually creating and overriding modules.
- Automatic discovery of debugging information from the standard
  locations and debuginfod.
- Interfaces for custom debug info finders and for manually overriding
  debugging information.
- Tons of test cases.

A lot of care was taken to make these interfaces extremely flexible yet
cohesive. The existing interfaces are also reimplemented on top of the
new functionality to maintain backwards compatibility, with one
exception: drgn.Program.load_debug_info()/-s would previously accept
files that it didn't find loaded in the program. This turned out to be a
big footgun for users, so now this must be done explicitly (with
drgn.ExtraModule/--extra-symbols).

The API and implementation both owe a lot to libdwfl:

- The concepts of modules, module address ranges/section addresses, and
  file biases are heavily inspired by the libdwfl interfaces.
- Ideas for determining modules in userspace processes and core dumps
  were taken from libdwfl.
- Our implementation of ELF symbol table address lookups is based on
  dwfl_module_addrinfo().

drgn has taken these concepts and fine-tuned them based on lessons
learned.

Credit is also due to Stephen Brennan for early testing and feedback.

Closes #16, closes #25, closes osandov#332.

Signed-off-by: Omar Sandoval <[email protected]>
Overwriting a file that libelf has already mmap'd can confuse it and
cause it to crash. In particular, libelf/elf_begin.c::file_read_elf()
initializes Elf_Scn::rawdata_base and Elf_Scn::data_base from the mmap'd
file. libelf/elf_getdata.c::__libelf_set_rawdata_wrlock() also sets
Elf_Scn::rawdata_base from the mmap'd file. If the file changes between
those two events, then Elf_Scn::rawdata_base will change. Then, the
following line in libelf/elf_end.c::elf_end() will try to free an mmap'd
pointer:

	if (scn->data_base != scn->rawdata_base)
	  free (scn->data_base);

Stephen reported crashes like this from
test_gnu_debugaltlink_debug_directories() while testing a patch that
inadvertently caused debug info to be indexed on module creation.

Reported-by: Stephen Brennan <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Since commit 4e83130 ("Introduce module and debug info finder
APIs"), DWARF indexing is somewhat lazy. As a result, drgn_void_type()
may require DWARF indexing in order to determine the program's main
language. This is overkill for drgn_object_init(), which just needs to
initialize a valid dummy object that is usually reinitialized
immediately.

Signed-off-by: Omar Sandoval <[email protected]>
Fixes: 4e83130 ("Introduce module and debug info finder APIs")
Signed-off-by: Omar Sandoval <[email protected]>
Mention that `drgn` is shipped with Ubuntu except jammy, but it's an
older version

Signed-off-by: Michel Lind <[email protected]>
Since libkdumpfile commit 5b044292abe9 ("Clarify and fix attribute data
lifetime") changes the lifetime of attribute values retrieved with
kdump_attr_ref_get(), the extra reference would keep the PRSTATUS blob
around even after kdump_free().

However, the attribute hierarchy cannot change while iterating over the
PRSTATUS attributes, so it is not necessary to take an attribute reference
and we can use kdump_get_typed_attr().

The attribute blob itself should not change either, but it is a good idea
to keep its data pinned, because a raw pointer to it is stored in the
drgn_thread_set hash table. If some code tries to modify the PRSTATUS
attribute data, the attempt will fail with KDUMP_ERR_BUSY rather than leave
a dangling pointer in the hash table and possibly cause a UAF bug later.

The blob pin does not prevent freeing the blob when the blob reference
count reaches zero.

Signed-off-by: Petr Tesarik <[email protected]>
The kdump_get_typed_attr() function prototype changed in libkdumpfile
commit e182aeaf4d72 ("Make kdump_get_typed_attr() easier to use").

Signed-off-by: Petr Tesarik <[email protected]>
…on_name

Multiple people have lamented that StackFrame.name is None for functions
implemented in assembly or missing debug info for any other reason. With
DWARFless debugging, this will be way more common. My original hope was
that StackFrame.name would strictly be the function name from the
debugging information and that callers would fall back to getting the
symbol name themselves. However, the distinction isn't super meaningful
to users, so let's add the fallback directly to StackFrame.name and add
StackFrame.function_name with the old behavior of StackFrame.name.

Signed-off-by: Omar Sandoval <[email protected]>
This was previously used for testing internals via ctypes, but it's no
longer needed.

Fixes: 7d251fe ("Translate C lexer tests to C unit tests")
Signed-off-by: Omar Sandoval <[email protected]>
The x86-64 fallback unwinder currently has a special case for handling a
call to a NULL pointer. Other architectures need the same workaround. To
avoid code duplication, let's extract the null program counter check
into the generic stack tracing code and add a bad_call_unwind
architecture callback. This also gives us a centralized place to add
heuristics for detecting non-null bad calls.

Signed-off-by: Omar Sandoval <[email protected]>
This will mainly be useful for bad_unwind_call implementations that make
use of drgn_register_state_dup() and need to mark some registers as
unknown in the unwound frame.

Signed-off-by: Omar Sandoval <[email protected]>
Since AArch64 uses a link register rather than storing the return
address on the stack, this is a bit easier than on x86-64. Fixes osandov#462.

Signed-off-by: Omar Sandoval <[email protected]>
This is a shorter-term solution for anyone who can't run a version of
drgn with the previous fix.

Signed-off-by: Omar Sandoval <[email protected]>
PID 0 is not unique in the Linux kernel; there is a task with PID 0 for
each CPU. stack_trace(0) currently fails with a generic "task not found"
error message, which can be confusing; see osandov#462. Add a hint to use
idle_task() to the error message when the given PID is 0.

Signed-off-by: Omar Sandoval <[email protected]>
Alec Rivers noticed in osandov#461 that WITH_LIBKDUMPFILE is misspelled as
WITH_KDUMPFILE here. The whole ifdef block isn't actually needed, so
remove it.

Fixes: 4e330bb ("cli: indicate if drgn was compiled with libkdumpfile")
Signed-off-by: Omar Sandoval <[email protected]>
None of our setter functions handle deletions, which pass the value as
NULL. This results in a segfault when attempting to access the value.
Fix them all with a new convenience macro.

Fixes: 50e4ac8 ("libdrgn: allow overriding program default language")
Fixes: 4e83130 ("Introduce module and debug info finder APIs")
Signed-off-by: Omar Sandoval <[email protected]>
STRING_BUILDER has been really convenient, so let's do the same for
vectors and hash tables.

Signed-off-by: Omar Sandoval <[email protected]>
…lable

Since the module API was introduced, Program.load_debug_info() and the
drgn CLI's -s option match strictly based on build IDs. This fails when
the build ID is not available, specifically in the case of Linux kernel
core dumps without a usable build ID in VMCOREINFO (old versions and a
few buggy stable versions).

Before the module API, Program.load_debug_info() and -s used any vmlinux
file given to them. This caused confusion when the wrong file was given,
so we don't want to bring that behavior back. Instead, let's look for a
vmlinux file matching the Linux version from VMCOREINFO.

Fixes osandov#464.

Fixes: 4e83130 ("Introduce module and debug info finder APIs")
Signed-off-by: Omar Sandoval <[email protected]>
We do this so often I'm surprised I didn't add this sooner.

Signed-off-by: Omar Sandoval <[email protected]>
osandov and others added 20 commits June 20, 2025 15:51
It's finally time. Start by updating the build system and documentation.
Cleanups will be done as followups.

Closes osandov#467.

Signed-off-by: Omar Sandoval <[email protected]>
Now that we require at least Python 3.8, we can assume we have
importlib.metadata.

Signed-off-by: Omar Sandoval <[email protected]>
Now that we don't support Python 3.6 anymore, we can assume that
_PyObject_GenericGetAttrWithDict() exists in DrgnObject_getattro(). This
also allows us to clean up the error handling in that function.

Signed-off-by: Omar Sandoval <[email protected]>
Now that we don't support Python 3.6, we can always monkey patch
logger._cache.clear().

Signed-off-by: Omar Sandoval <[email protected]>
We can assume that PyMapping_Items() always returns a list since Python
3.7.

Signed-off-by: Omar Sandoval <[email protected]>
Now that we don't support Python 3.6, we never need this call.

Signed-off-by: Omar Sandoval <[email protected]>
… for Python 3.6

These were added in Python 3.7, so we don't need the fallback
definitions anymore.

Signed-off-by: Omar Sandoval <[email protected]>
Since Python 3.7, we don't need the hack to cast docstrings to char *.

Signed-off-by: Omar Sandoval <[email protected]>
Since Python 3.8, we don't need to replace
ast.{Num,Str,Bytes,Ellipsis,NameConstant} with ast.Constant.

Signed-off-by: Omar Sandoval <[email protected]>
Now that we don't support Python 3.6, we can finally get replace the
boilerplate for the various node types with dataclasses.

Signed-off-by: Omar Sandoval <[email protected]>
Since Python 3.7, we can have _BtrfsItemHandler inherit from both
NamedTuple and Generic. (I get a bunch of unrelated errors running mypy
on this file, so it doesn't really matter.)

Signed-off-by: Omar Sandoval <[email protected]>
Since Python 3.8, the standard library properly sets __signature__.

Signed-off-by: Omar Sandoval <[email protected]>
…n Python 3.6"

This reverts commit 2b93066. We don't
need this now that we've dropped support for Python 3.6.

Signed-off-by: Omar Sandoval <[email protected]>
We can now assume that we have io.open_code().

Signed-off-by: Omar Sandoval <[email protected]>
We can now assume we have typing.Literal.

Signed-off-by: Omar Sandoval <[email protected]>
unittest.TestCase.{addClassCleanup,doClassCleanups} were added in Python
3.8, so we can drop our "backport" and the @classCleanups hack.

Signed-off-by: Omar Sandoval <[email protected]>
Final and Protocol were added to typing in Python 3.8.

Signed-off-by: Omar Sandoval <[email protected]>
Now that we don't support Python 3.6 or 3.7, we can use
typing.SupportsIndex instead of defining our own copy of it. We still
export it as drgn.IntegerLike for backwards-compatibility and because
it's a more self-explanatory name.

Signed-off-by: Omar Sandoval <[email protected]>
Validators are listed in the "core concepts" section of the user guide,
but they never really caught on and aren't necessary to introduce to new
users. Move the documentation to ValidationError and the examples to
validate_list() and validate_list_for_each_entry().

Taking a step back, I think the current interface for validators is
wrong. It's fine if you know what data structure you want to validate,
but there's no way to do validation for higher-level helpers that use
list_for_each_entry(), etc. internally. A better interface would allow
enabling/disabling validation for *everything*, either with function
calls, context managers, or both. The tricky part is supporting that
without too much overhead.

Regardless, this is the interface we have for now, so let's keep it but
deemphasize it.

Signed-off-by: Omar Sandoval <[email protected]>
@sebroy sebroy force-pushed the dlpx/pr/sebroy/cf484e30-86d4-4277-91d9-c1444e52d4e3 branch from 30a20bf to 6938fa8 Compare June 26, 2025 23:47
@sebroy sebroy changed the title Merge remote-tracking branch 'origin/upstreams/develop' into develop DLPX-94248 address drgn merge conflict with upstream Jun 26, 2025
@sebroy sebroy marked this pull request as ready for review June 26, 2025 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants