Support for dynamically instrumenting hotpachable functions

Background and purpose of this document

Orbit allows instrumenting functions in the target binary dynamically, i.e. one choses a function from the debug symbols and Orbit modifies the binary such that every call to that function produces a time interval in the capture timeline.

The way this works is that we overwrite the first five bytes of the function with a jump into some carefully crafted code that backs up the current state of the calling thread, emits the tracing information and then continues the execution. This works fine in most cases but there are some challenging situations e.g. there can be a jump to an address in the first five bytes somewhere in the instrumented function or the function can be shorter than five bytes or the first instruction can be a call to another function.

Compilers offer options to make dynamic instrumentation easier. The way this works is that the compiler adds a few bytes of padding in front of every function and a two byte nop instruction at the function entry. One can then overwrite these first two bytes with a “jmp -7” which ends up in the padded bytes before the function. These padded bytes are overwritten with the actual jump to a 32 bit offset that leads to the instrumentation code. Details vary a bit from compiler to compiler - see the sections “Support in X” below.

Up until now we ignore whether or not the target binary was compiled with support for “hotpatchable functions” (clang and gcc call them “patchable functions”, the msvc documentation uses the term “hotpatchable image”). The purpose of this document is to suggest a way to utilise the support from the compiler in case the binary was compiled with the relevant options.

Figure out whether or not a function is hotpatchable

In order to represent functions in Orbit we read information from various different inputs:

Elf files: debug symbols, dynsym, eh or debug frame entries. Coff files: coff symbol table, dwarf, export table, exception table Pdb files parsed with dia: SymTagFunction, SymTagPublicSymbol Pdb files parsed with llvm: ProcSym, public symbols

This is less colourful than it looks at first sight: Basically we either parse debug info or exported functions or something derived from unwinding info. In the end all of these sources produce a ModuleSymbols which is a repeated SymbolInfo’s. We add a bool is_hotpatchable to SymbolInfo.

For now we only support elf binaries (created by clang or gcc). As shown below both compilers produce the same __patchable_function_entries section in the elf file. If this section is present and a given symbol is listed there we mark it as hotpatchable otherwise not. Specifically we mark windows code as not hotpatchable although msvc supports that.

In ModuleData::AddSymbolsInternal the SymbolInfo is translated into a FunctionInfo (src/ClientData/include/ClientData/FunctionInfo.h). So FunctionInfo get an additional field ”bool is_hotpatchable_;”

In src/CaptureClient/CaptureClient.cpp ToGrpcCaptureOptions translates the FunctionInfo to a InstrumentedFunction (src/GrpcProtos/capture.proto) which also gets the is_hotpatchable field. With this the information is available in the service. Specially InstrumentedProcess::InstrumentFunctions has this information (since it gets the InstrumentedFunction’s in the capture options).

Size of padding and nop’s

The above only considered whether or not the binary was compiled with hotpatchable function support. But besides that the compilers offer an option to adjust the size of the padding and - in case of clang or gcc - also an option to adjust the size of the nop at the start. So a user could compile a binary with a padding that is too short or a nop that is not exactly two bytes in size. Both would lead to crashes if we ignore it.

We should either

add a capture option to explicitly activate the usage of hotpatchable function support. The explanation for the option should suggest the “-fpatchable-function-entry=7,5” parameter explicitly. This comes with the complication that subsequent captures could use different flavours of instrumentation and therefore the trampolines cannot be reused (putting it another way: we would need a hotpachable trampoline and a regular trampoline).

or

detect the size of the padding and the nop and disable hotpatchable function support if the binary code looks unexpected.

or both of the above.

Only implementing the second option seems most straightforward. There is no reason for switching off the hotpatchable support if the binary is compiled with appropriate parameters.

Handling instrumentation for hotpatchable functions

Some changes are needed in CreateTrampoline:

CreateTrampoline needs to know about is_hotpatchable
CheckForRelativeJumpIntoFirstFiveBytes is not required if is_hotpatchable
AppendRelocatedPrologueCode can be skipped - there is no need to relocate a nop
the address to jump back to after the trampolin has completed execution is always function_address+2

InstrumentFunction needs to do something different.

UninstrumentFunctions works fine since it just restores the first 20 bytes of the function (so this works fine for the 2 byte relative 8 bit jump as well)

Current state

https://github.com/google/orbit/pull/4497 implements the things described here for Linux binaries produced by clang or gcc.

The test for the correct parameter setting as described here is missing.

The information whether a function is hotpatchable is not yet used in dynamic instrumentation (as outlined in here).

Appendix

The different compilers handle hotpatchable functions slightly different. The sections below

Support in Clang

https://clang.llvm.org/docs/AttributeReference.html#patchable-function-entry

#include <iostream>

int fun(int x) {
  return 2 * x;
}

int main(int argc, char **argv) {
  int x = 42;
  std::cout << fun(x) << "\n";
  return 0;
}

> clang++ main.cc -std=c++17 -fpatchable-function-entry=7,5

And then have a look at the binary

> objdump -D a.out

00000000000011c0 <frame_dummy>:
    11c0:       f3 0f 1e fa             endbr64
    11c4:       e9 77 ff ff ff          jmp    1140 <register_tm_clones>
    11c9:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
    11d0:       90                      nop
    11d1:       90                      nop
    11d2:       90                      nop
    11d3:       90                      nop
    11d4:       90                      nop

00000000000011d5 <_Z3funi>:
    11d5:       66 90                   xchg   %ax,%ax
    11d7:       55                      push   %rbp
    11d8:       48 89 e5                mov    %rsp,%rbp

There are 7 bytes of NOP’s (11d0 - 11d6). The function starts after the 5th byte. So there is a 7-5==2 byte NOP at the function entry.

> readelf -t a.out

Section Headers:
  [Nr] Name
       Type              Address          Offset            Link
       Size              EntSize          Info              Align
       Flags
…
[27] __patchable_function_entries
       PROGBITS         0000000000004030  0000000000003030  16
       0000000000000020 0000000000000000  0                 8
       [0000000000000083]: WRITE, ALLOC, LINK ORDER
…

> readelf --hex-dump=27 a.out

Hex dump of section '__patchable_function_entries':
  0x00004030 80100000 00000000 c0100000 00000000 ................
  0x00004040 d0110000 00000000 f0110000 00000000 ................

These are just the addresses of the patchable functions. The address given is the start of the first NOP (in this example 5 bytes before the function entry point).

Support in gcc

Code as above. Compile with

> g++ main.cc -g -std=c++17 -fpatchable-function-entry=7,5

And then have a look at the binary

> objdump -D a.out

0000000000001160 <frame_dummy>:
    1160:       f3 0f 1e fa             endbr64
    1164:       e9 77 ff ff ff          jmp    10e0 <register_tm_clones>
    1169:       90                      nop
    116a:       90                      nop
    116b:       90                      nop
    116c:       90                      nop
    116d:       90                      nop

000000000000116e <_Z3funi>:
    116e:       90                      nop
    116f:       90                      nop
    1170:       55                      push   %rbp
    1171:       48 89 e5                mov    %rsp,%rbp

As with clang above there are 7 bytes of NOP’s (1169 - 116f). The function starts after the 5th byte. The two bytes of nops are individual instructions here (using g++-12) whereas clang above produced a single instruction of length two.

> readelf -t a.out

 [26] __patchable_function_entries
       PROGBITS         0000000000004030  0000000000003030  15
       0000000000000020 0000000000000000  0                 8
       [0000000000000083]: WRITE, ALLOC, LINK ORDER

> readelf --hex-dump=26 a.out

Hex dump of section '__patchable_function_entries':
  0x00004030 69110000 00000000 7e110000 00000000 i.......~.......
  0x00004040 d2110000 00000000 2b120000 00000000 ........+.......

So the __patchable_function_entries section looks exactly like the one produced by clang.

Support in msvc

https://learn.microsoft.com/en-us/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-160

https://learn.microsoft.com/en-us/cpp/build/reference/functionpadmin-create-hotpatchable-image?view=msvc-160

For x64 the first instruction after the function entry is at least two bytes long (so we are guaranteed that we only need to relocate one instruction). The padding can be adjusted (and defaults to six bytes for x64; would be interesting to know why - we only need five?!).

Looks like msvc inserts a 0xcc’s as a padding (at least six but rounds up so the function entry is on a multiple of 16 - this is just what I guess from looking at examples). As pointed out in the documentation - there is no nop at the beginning of the function; we need to relocate the first instruction (and only this one because it is at least two bytes long).

It's unclear to me which functions are hotpatchable. Maybe all of them? Dumpbin does not show any section as they are present in elf binaries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly