Skip to content

Conversation

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Dec 8, 2025

The function__entry and function__return probes stopped working in Python 3.11
when the interpreter was restructured around the new bytecode system. This change
restores these probes by adding DTRACE_FUNCTION_ENTRY() at the start_frame label
in bytecodes.c and DTRACE_FUNCTION_RETURN() in the RETURN_VALUE and YIELD_VALUE
instructions. The helper functions are defined in ceval.c and extract the
filename, function name, and line number from the frame before firing the probe.

This builds on the approach from #125019
but avoids modifying the JIT template since the JIT does not currently support
DTrace. The macros are conditionally compiled with WITH_DTRACE and are no-ops
otherwise. The tests have been updated to use modern opcode names (CALL, CALL_KW,
CALL_FUNCTION_EX) and a new bpftrace backend was added for Linux CI alongside
the existing SystemTap tests. Line probe tests were removed since that probe
was never restored after 3.11.

The function__entry and function__return probes stopped working in Python 3.11
when the interpreter was restructured around the new bytecode system. This change
restores these probes by adding DTRACE_FUNCTION_ENTRY() at the start_frame label
in bytecodes.c and DTRACE_FUNCTION_RETURN() in the RETURN_VALUE and YIELD_VALUE
instructions. The helper functions are defined in ceval.c and extract the
filename, function name, and line number from the frame before firing the probe.

This builds on the approach from python#125019
but avoids modifying the JIT template since the JIT does not currently support
DTrace. The macros are conditionally compiled with WITH_DTRACE and are no-ops
otherwise. The tests have been updated to use modern opcode names (CALL, CALL_KW,
CALL_FUNCTION_EX) and a new bpftrace backend was added for Linux CI alongside
the existing SystemTap tests. Line probe tests were removed since that probe
was never restored after 3.11.
@pablogsal
Copy link
Member Author

I plan to set up a buildbot after this is merged so we don't inadvertently break this again.

@pablogsal
Copy link
Member Author

Example usage:

$ sudo bpftrace -e '
  usdt:./python:python:function__entry { printf("-> %s\n", str(arg1)); }
  usdt:./python:python:function__return { printf("<- %s\n", str(arg1)); }
  ' -c './python -c "pass"'

Attached 2 probes
-> <module>
-> _List
<- _List
-> _WeakValueDictionary
<- _WeakValueDictionary
-> _BlockingOnManager
<- _BlockingOnManager
-> _DeadlockError
<- _DeadlockError
-> _ModuleLock
<- _ModuleLock
-> _DummyModuleLock
<- _DummyModuleLock
-> _ModuleLockManager
<- _ModuleLockManager
-> ModuleSpec
<- ModuleSpec
-> BuiltinImporter
-> _requires_builtin
-> _wrap
<- _wrap
<- _requires_builtin
-> _requires_builtin
<- _wrap
<- _requires_builtin
-> _requires_builtin
<- _wrap
<- _requires_builtin
<- BuiltinImporter
-> FrozenImporter
-> _requires_frozen
-> _wrap
<- _wrap
<- _requires_frozen
-> _requires_frozen
<- _wrap
<- _requires_frozen
-> _requires_frozen
<- _wrap
<- _requires_frozen
<- FrozenImporter
-> _ImportLockContext
<- _ImportLockContext
...

@markshannon
Copy link
Member

markshannon commented Dec 8, 2025

I'm opposed to this for the same reasons I was opposed to #125019: #125019 (comment).
As soon as the JIT kicks in, the results will be meaningless.

dtrace claims to be zero overhead when not in use, but thisPR isn't. The PyDTrace_FUNCTION_ENTRY/RETURN() maybe a nop, but the preparatory code is not. PyUnstable_InterpreterFrame_GetLine is linear in the size of the code object, so could be very slow and would be called even if tracing is off.

Also, this requires specially built binaries, which means it cannot be used unless you build your own Python.

How about using an approach like https://github.com/natoscott/pyusdt, which works on any version 3.12+?

Finally, do you know where I can find proper documentation for usdt probes? None of the docs I can find describe how executables should support probes, just explain how to write dtrace scripts.

@pablogsal
Copy link
Member Author

pablogsal commented Dec 8, 2025

I'm opposed to this for the same reasons I was opposed to #125019: #125019 (comment).
As soon as the JIT kicks in, the results will be meaningless.

Then the JIT cannot kick in, it's as simple as that. This was functionality that was there and was broken. We cannot just break things to accommodate the JIT if you plan to make it on by default. This is not how we treat features that we're working before.
#103083 didn't follow PEP 387 so you cannot just say that this is the status quo because this was removed wihtout consensus and against our backwards compatibility policy. This PR just restores functionality.

dtrace claims to be zero overhead when not in use, but thisPR isn't. The PyDTrace_FUNCTION_ENTRY/RETURN() maybe a nop, but the preparatory code is not. PyUnstable_InterpreterFrame_GetLine is linear in the size of the code object, so could be very slow and would be called even if tracing is off.

I don't understand this claim. The functions here only quick in when the probes are active. What preparatory work are you talking about?

Also, this requires specially built binaries, which means it cannot be used unless you build your own Python.

Redistributes that want to support this will build with the dtrace option, that's not a problem

How about using an approach like https://github.com/natoscott/pyusdt, which works on any version 3.12+?

That doesn't do the same thing. The idea of this approach is to activate the probes in a uncooperative Python interpreter without having to modify or inject anything. It's how dtrace support works in other binaries

@pablogsal
Copy link
Member Author

CC @brandtbucher

@pablogsal
Copy link
Member Author

pablogsal commented Dec 8, 2025

Finally, do you know where I can find proper documentation for usdt probes? None of the docs I can find describe how executables should support probes, just explain how to write dtrace scripts.

Here's a curated list of documentation sources:

Real-World Implementation Examples:

@stratakis
Copy link
Contributor

I plan to set up a buildbot after this is merged so we don't inadvertently break this again.

Fedora and RHEL have been using this option since like forever. As soon as this change lands (and hopefully backported) I'll enable it in the buildbots.

@markshannon
Copy link
Member

@pablogsal thanks for all the links.

I don't understand this claim. The functions here only quick in when the probes are active. What preparatory work are you talking about?

That was my misreading of the code. But do be aware that PyUnstable_InterpreterFrame_GetLine() is potentially very slow for what should be lightweight monitoring.

How about using an approach like https://github.com/natoscott/pyusdt, which works on any version 3.12+?

That doesn't do the same thing. The idea of this approach is to activate the probes in a uncooperative Python interpreter without having to modify or inject anything. It's how dtrace support works in other binaries

Why is it not doing the same thing? And isn't activating probes in a normal "uncooperative" Python interpreter want we want? No custom builds, it just works.

@pablogsal
Copy link
Member Author

Why is it not doing the same thing?

Because in one you need the application to set up everything and in the other one you can sample. This separates the decision of who decides to sample and trace from who makes the application. This is very important because most of the time these people are different groups and one has no control over what the other would do.

And isn't activating probes in a normal "uncooperative" Python interpreter want we want? No custom builds, it just works.

No, for example you can redistribute python with dtrace (like fedora does as @stratakis said) and then all users will benefit from it. At companies people will build CPython in a way that works with the internal ecosystem adding dtrace support. For the user of such interpreter tracing also "just works". The user here can be an application developer or the observabiliyu team.

I am sure @brandtbucher can give some pointers on this as well

@markshannon
Copy link
Member

markshannon commented Dec 9, 2025

Because in one you need the application to set up everything and in the other one you can sample.

Sampling doesn't happen magically. Dtrace, or some other tool, needs to make some very invasive changes to the CPython executable.
Why it that better than signalling to the VM to turn on instrumentation, like the remote debugger does?

We could even add if DTRACE_ENABLED() turnon_instrumentation() in a few places that aren't on the hot path of the interpreter, but are called frequently enough to be useful, so that insertion of probe points turns on instrumentation.

@markshannon
Copy link
Member

I think we should be doing this properly or not at all.

Doing it properly means it should:

  • work with the JIT
  • handle sends as well as calls
  • handle unwinding as well as returns and yields
  • Always be available, not just on custom builds.

Leveraging sys.monitoring seems like a robust way to do these things.

@pablogsal
Copy link
Member Author

pablogsal commented Dec 9, 2025

Sampling doesn't happen magically. Dtrace, or some other tool, needs to make some very invasive changes to the CPython executable. Why it that better than signalling to the VM to turn on instrumentation, like the remote debugger does?

Invasive? What are you talking about? This is just noops and is super simple to integrate. I mean I guess is subjective but I think this is really far for being invasive.

Why it that better than signalling to the VM to turn on instrumentation, like the remote debugger does?

Because it doesn't require the VM to execute code and cooperate and doesn't integrate with existing tracing frameworks. You may seem to not care about this and that's fine but please understand people do care a lot about that distinction.

I think we should be doing this properly or not at all.

Mark, I understand your point or view and I don't mind discussing this issues together and figure out things, but let me please insist that I am restoring functionality that was here before and it was broken without following proper procedure and without agreement.

I think you are approaching this as is a new feature that needs to contend with other proposals, but this feature existed, had users and deployments, it was broken without following PEP 387 and must be restored.

@pablogsal
Copy link
Member Author

work with the JIT

Why? Is the JIT that needs to prove that can coexist with other things that CPython does, not the other way around

@markshannon
Copy link
Member

Invasive? What are you talking about? This is just noops and is super simple to integrate. I mean I guess is subjective but I think this is really far for being invasive.

I mean at runtime, not the code changes. The tool needs to modify the machine code of the running executable. That's quite invasive.

Because it doesn't require the VM to execute code and cooperate and doesn't integrate with existing tracing frameworks.

It does require the VM to execute code, but that is unavoidable with a JIT. As long as we fire the right probes at the right time, it should integrate just fine.

@markshannon
Copy link
Member

Can we test this on CI?
Running a buildbot in your basement* will work for a while, but what happens when you no are no longer willing or able to maintain it?

*Rhetorical basement. I don' think Pablo has a basement.

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've a duplicate entry probe point and you're missing a return probe point for unwinding.

_PyStackRef temp = retval;
DEAD(retval);
SAVE_STACK();
DTRACE_FUNCTION_RETURN();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've covered return and yield, but you'll need a probe point for unwinding as well.

PyCodeObject *code = _PyFrame_GetCode(frame);
filename = PyUnicode_AsUTF8(code->co_filename);
funcname = PyUnicode_AsUTF8(code->co_name);
lineno = PyUnstable_InterpreterFrame_GetLine(frame);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expensive. In sys.monitoring there is a cache of line numbers created for line events.
You could reuse that, but it does require allocating monitoring data for the code object.

Ultimately, I'd like to redesign the location table to use a series of frames. That way finding the location info could be done with a binary search followed by a short linear scan.

if (_Py_EnterRecursivePy(tstate)) {
goto early_exit;
}
DTRACE_FUNCTION_ENTRY();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the probe point in start_frame:

@pablogsal
Copy link
Member Author

pablogsal commented Dec 9, 2025

Can we test this on CI?
Running a buildbot in your basement* will work for a while, but what happens when you no are no longer willing or able to maintain it?

Perhaps but my fear is that it needs sudo, it has the same problem as the perf integration where the access to bpd code is not exposed in the CI runners.

Running a buildbot in your basement*

Haha, well the full buildbot fleet is a collection of build bots in everyone basements but that won't stop them to block releases if they are Tier 1 ;)

but what happens when you no are no longer willing or able to maintain it?

Someone else will then (like fedora is doing now). Is not like this cannot be tested in normal machines with sudo (@stratakis tests this on fedora's CI for example). But this is not a question that cannot be applied to almost anything else that's exclusively tested on build bots so I don't think is a big concern :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants