Skip to content

[PROF-11524] Upgrade libdatadog dependency to version 18.1 #4577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
May 30, 2025

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Apr 11, 2025

What does this PR do?

Updates the native bits in the gem (profiling, crashtracking) to be able to work with the latest libdatadog 18.

I'm opening it as a draft so we can use it to test the upcoming libdatadog 18.1 release. Once that release is out and on rubygems.org, I plan to:

  1. Push the actual gem dependency change on this PR
  2. Do a pass on the TODO/FIXME things that are still missing

All ready and in good shape!

Motivation:

There's a number of fixes and additions in libdatadog that we want to make use of.

Change log entry

Yes. Upgrade libdatadog dependency to 18.1

Additional Notes:

With this PR, I'm able to build both crashtracking AND profiling with libdatadog master as of 11th of April. I'm also able to get a green test suite for profiling, but not for crashtracking: I've reported the issue to the folks working on that, so they're aware.

All green!

Blockers for this PR:

  • Being able to update libdatadog crashtracking configration (endpoint, tags) Fixed!

How to test the change?

Once everything is in place, our existing test coverage will be enough to validate these changes. Until then it's expected that CI is going to be red.

@github-actions github-actions bot added the profiling Involves Datadog profiling label Apr 11, 2025
@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Apr 11, 2025

Datadog Report

Branch report: ivoanjo/prof-11524-upgrade-libdatadog17-try3
Commit report: 2cf6513
Test service: dd-trace-rb

✅ 0 Failed, 21496 Passed, 1299 Skipped, 4m 6.7s Total Time

ivoanjo added 11 commits May 15, 2025 10:09
…on every report

Reusing an `encoded_profile` object will no longer be possible with
libdatadog 17+.

This does change what the benchmark was covering, so comparing with
prior results is not particularly relevant.

I did consider removing this benchmark (we haven't used it for a
long time for anything in particular), but on the other hand these
microbenchmarks that exercise libdatadog have been useful sometimes so
why not.
This tag will be set automatically by libdatadog 17+ so we no longer
need to set it. See DataDog/libdatadog#954
for the change on the libdatadog side.
Still missing: cancellation token changes.
The removed spec for the invalid sample no longer triggers on
release builds of libdatadog (it's only for debug builds).
Still discussing with folks what the next steps are on this one.
With the move to handles, I'm unsure if our previous approach was
correct, so I've tweaked our usage and added a bit more validation to
catch any unexpected issues.
It took us a while to understand why crashtracking was not working
correctly with modern versions of libdatadog.

It turns out the problem was that we were setting a too low timeout
for Ruby crash reports.

I suspect this issue has been hidden for a while because before
libdatadog 17, crashtracking was almost always giving up on stack
symbolication ( DataDog/libdatadog#991 ),
which was a much faster path, and thus we didn't notice
that the timeout required adjusting until libdatadog 17 came
around and fixed symbolization (making it slower).

I first tried with the libdatadog default of 5 seconds and it
still wasn't enough on my machine. Then I bumped it slightly
higher and that seems to be enough.

I'm surprised symbolication is this expensive...
@ivoanjo ivoanjo force-pushed the ivoanjo/prof-11524-upgrade-libdatadog17-try3 branch from e8bf7f4 to a9a715c Compare May 15, 2025 09:12
@github-actions github-actions bot added the core Involves Datadog core libraries label May 15, 2025
@ivoanjo ivoanjo changed the title [PROF-11524] Upgrade libdatadog dependency to 17.1 [PROF-11524] Upgrade libdatadog dependency to version 18 May 15, 2025
@ivoanjo ivoanjo changed the title [PROF-11524] Upgrade libdatadog dependency to version 18 [PROF-11524] Upgrade libdatadog dependency to version 18.1 May 15, 2025
This test is currently failing pending some work on the libdatadog side.
@ivoanjo
Copy link
Member Author

ivoanjo commented May 15, 2025

Update

I've:

  • rebased this PR on top of current master
  • validated the latest libdatadog master changes
  • updated it to mention the upgrade will be to 18.1 (or later)

We have one last blocker (which I've listed in the PR notes), and we should be good to go.

With the new disable/reconfigure APIs we can finally restore the
full correct behavior to the crashtracker.
ivoanjo added a commit to DataDog/libdatadog that referenced this pull request May 27, 2025
…#1066)

# What does this PR do?

This PR adds the following new APIs to the crashtracker:
* `ddog_crasht_enable`
* `ddog_crasht_disable`
* `ddog_crasht_reconfigure`

# Motivation

Prior to #1000, it was possible to:

* reconfigure the crashtracker by calling init again
* stop the crashtracker

Both features were removed as it could cause issues with signal handlers and chaining.

But, we still need to be able to update the configuration as for instance the following can change at run-time for Ruby and need to be updated:
    
* Endpoint
* Tags

This PR restores some of the behavior removed in #1000, without the unsafety of touching signal handlers:

* `enable`/`disable` just skip the crashtracker signal handling code, but never uninstall it
* `reconfigure` does the same as `init` BUT without touching the signal handlers

# Additional Notes

This PR supercedes #1017 .

# How to test the change?

I've validated this branch with DataDog/dd-trace-rb#4577 on the Ruby side, and I'm able to get a green test suite for the Ruby crashtracker.

---------

Co-authored-by: Daniel Schwartz-Narbonne <[email protected]>
ivoanjo added a commit to DataDog/libdatadog that referenced this pull request May 28, 2025
…ease

**What does this PR do?**

This PR bumps the libdatadog version from 18.0.0 to 18.1.0 in
preparation for release.

We're going from v18.0 to v18.1 as I'm not aware of any
backwards-incompatible changes being merged to master.

**Motivation:**

Ruby has been stuck on an old libdatadog release due to a few API
changes, and I want to finally move it to latest libdatadog.

**Additional Notes:**

I did a diff of the v18.0 to v18.1 headers to validate there's no
backwards-incompatible changes using the `difft` tool:

```
library-config.h --- C++
No changes.

blazesym.h --- C++
No changes.

profiling.h --- C++
No changes.

crashtracker.h --- 1/2 --- C++
57 57 extern "C" {
58 58 #endif // __cplusplus
.. 59
.. 60 /**
.. 61  * Disables the crashtracker.
.. 62  * Note that this does not restore the old signal handlers, but rather turns crash-tracking into a
.. 63  * no-op, and then chains the old handlers.  This means that handlers registered after the
.. 64  * crashtracker will continue to work as expected.
.. 65  *
.. 66  * # Preconditions
.. 67  *   None
.. 68  * # Safety
.. 69  *   None
.. 70  * # Atomicity
.. 71  *   This function is atomic and idempotent.  Calling it multiple times is allowed.
.. 72  */
.. 73 DDOG_CHECK_RETURN struct ddog_VoidResult ddog_crasht_disable(void);
.. 74
.. 75 /**
.. 76  * Enables the crashtracker, if it had been previously disabled.
.. 77  * If crashtracking has not been initialized, this function will have no effect.
.. 78  *
.. 79  * # Preconditions
.. 80  *   None
.. 81  * # Safety
.. 82  *   None
.. 83  * # Atomicity
.. 84  *   This function is atomic and idempotent.  Calling it multiple times is allowed.
.. 85  */
.. 86 DDOG_CHECK_RETURN struct ddog_VoidResult ddog_crasht_enable(void);
59 87
60 88 /**
61 89  * Reinitialize the crash-tracking infrastructure after a fork.

crashtracker.h --- 2/2 --- C++
 98 126                                         struct ddog_crasht_ReceiverConfig receiver_config,
 99 127                                         struct ddog_crasht_Metadata metadata);
 .. 128
 .. 129 /**
 .. 130  * Reconfigure the crashtracker and re-enables it.
 .. 131  * If the crashtracker has not been initialized, this function will have no effect.
 .. 132  *
 .. 133  * # Preconditions
 .. 134  *   None.
 .. 135  * # Safety
 .. 136  *   Crash-tracking functions are not reentrant.
 .. 137  *   No other crash-handler functions should be called concurrently.
 .. 138  * # Atomicity
 .. 139  *   This function is not atomic. A crash during its execution may lead to
 .. 140  *   unexpected crash-handling behaviour.
 .. 141  */
 .. 142 DDOG_CHECK_RETURN
 .. 143 struct ddog_VoidResult ddog_crasht_reconfigure(struct ddog_crasht_Config config,
 .. 144                                                struct ddog_crasht_ReceiverConfig receiver_config,
 .. 145                                                struct ddog_crasht_Metadata metadata);
100 146
101 147 /**
102 148  * Initialize the crash-tracking infrastructure without launching the receiver.

data-pipeline.h --- C++
No changes.

telemetry.h --- C++
No changes.

common.h --- C++
No changes.
```

**How to test the change?**

I've tested the libdatadog 18.1 release locally using the Ruby test
suite and DataDog/dd-trace-rb#4577 .
ivoanjo added a commit to DataDog/libdatadog that referenced this pull request May 28, 2025
…ease (#1075)

**What does this PR do?**

This PR bumps the libdatadog version from 18.0.0 to 18.1.0 in
preparation for release.

We're going from v18.0 to v18.1 as I'm not aware of any
backwards-incompatible changes being merged to master.

**Motivation:**

Ruby has been stuck on an old libdatadog release due to a few API
changes, and I want to finally move it to latest libdatadog.

**Additional Notes:**

I did a diff of the v18.0 to v18.1 headers to validate there's no
backwards-incompatible changes using the `difft` tool:

```
library-config.h --- C++
No changes.

blazesym.h --- C++
No changes.

profiling.h --- C++
No changes.

crashtracker.h --- 1/2 --- C++
57 57 extern "C" {
58 58 #endif // __cplusplus
.. 59
.. 60 /**
.. 61  * Disables the crashtracker.
.. 62  * Note that this does not restore the old signal handlers, but rather turns crash-tracking into a
.. 63  * no-op, and then chains the old handlers.  This means that handlers registered after the
.. 64  * crashtracker will continue to work as expected.
.. 65  *
.. 66  * # Preconditions
.. 67  *   None
.. 68  * # Safety
.. 69  *   None
.. 70  * # Atomicity
.. 71  *   This function is atomic and idempotent.  Calling it multiple times is allowed.
.. 72  */
.. 73 DDOG_CHECK_RETURN struct ddog_VoidResult ddog_crasht_disable(void);
.. 74
.. 75 /**
.. 76  * Enables the crashtracker, if it had been previously disabled.
.. 77  * If crashtracking has not been initialized, this function will have no effect.
.. 78  *
.. 79  * # Preconditions
.. 80  *   None
.. 81  * # Safety
.. 82  *   None
.. 83  * # Atomicity
.. 84  *   This function is atomic and idempotent.  Calling it multiple times is allowed.
.. 85  */
.. 86 DDOG_CHECK_RETURN struct ddog_VoidResult ddog_crasht_enable(void);
59 87
60 88 /**
61 89  * Reinitialize the crash-tracking infrastructure after a fork.

crashtracker.h --- 2/2 --- C++
 98 126                                         struct ddog_crasht_ReceiverConfig receiver_config,
 99 127                                         struct ddog_crasht_Metadata metadata);
 .. 128
 .. 129 /**
 .. 130  * Reconfigure the crashtracker and re-enables it.
 .. 131  * If the crashtracker has not been initialized, this function will have no effect.
 .. 132  *
 .. 133  * # Preconditions
 .. 134  *   None.
 .. 135  * # Safety
 .. 136  *   Crash-tracking functions are not reentrant.
 .. 137  *   No other crash-handler functions should be called concurrently.
 .. 138  * # Atomicity
 .. 139  *   This function is not atomic. A crash during its execution may lead to
 .. 140  *   unexpected crash-handling behaviour.
 .. 141  */
 .. 142 DDOG_CHECK_RETURN
 .. 143 struct ddog_VoidResult ddog_crasht_reconfigure(struct ddog_crasht_Config config,
 .. 144                                                struct ddog_crasht_ReceiverConfig receiver_config,
 .. 145                                                struct ddog_crasht_Metadata metadata);
100 146
101 147 /**
102 148  * Initialize the crash-tracking infrastructure without launching the receiver.

data-pipeline.h --- C++
No changes.

telemetry.h --- C++
No changes.

common.h --- C++
No changes.
```

**How to test the change?**

I've tested the libdatadog 18.1 release locally using the Ruby test
suite and DataDog/dd-trace-rb#4577 .
ivoanjo added 3 commits May 30, 2025 09:33
This test started failing when I added a sleep(1) so I think we're
very close to the limit (of 10 seconds), so let's create a bit of
headroom to avoid future flakiness.
ivoanjo added a commit to DataDog/libdatadog that referenced this pull request May 30, 2025
**What does this PR do?**

This PR includes the changes documented in the "Releasing a new version to rubygems.org" part of the README:
https://github.com/datadog/libdatadog/tree/main/ruby#releasing-a-new-version-to-rubygemsorg

It also includes an updated to the `gem_packaging.rb` test to add the
new `INTERNED_EMPTY_STRING` public symbol added in #917 .

**Motivation:**

Enable Ruby to use libdatadog v18.1.0. This includes A LOT of changes
to profiling, crashtracking, configuration, process discovery etc
that have been pending for Ruby for too long.

**Additional Notes:**

The `INTERNED_EMPTY_STRING` new symbol seems ok to have (e.g. I don't
expect it to clash with other libraries), but just in case I will
discuss with @danielsn prefixing it with `ddog_`.

Also it's cool to see our tests working! A new symbol that's not
prefixed with the expected strings showed up, and we caught it :)

**How to test the change?**

I've tested this release locally using the changes in
DataDog/dd-trace-rb#4577 .

As a reminder, new libdatadog releases don't get automatically picked up
by dd-trace-rb, so the PR that bumps the dependency will also test
this release against all supported Ruby versions.
ivoanjo added a commit to DataDog/libdatadog that referenced this pull request May 30, 2025
**What does this PR do?**

This PR includes the changes documented in the "Releasing a new version to rubygems.org" part of the README:
https://github.com/datadog/libdatadog/tree/main/ruby#releasing-a-new-version-to-rubygemsorg

It also includes an updated to the `gem_packaging.rb` test to add the
new `INTERNED_EMPTY_STRING` public symbol added in #917 .

**Motivation:**

Enable Ruby to use libdatadog v18.1.0. This includes A LOT of changes
to profiling, crashtracking, configuration, process discovery etc
that have been pending for Ruby for too long.

**Additional Notes:**

The `INTERNED_EMPTY_STRING` new symbol seems ok to have (e.g. I don't
expect it to clash with other libraries), but just in case I will
discuss with @danielsn prefixing it with `ddog_`.

Also it's cool to see our tests working! A new symbol that's not
prefixed with the expected strings showed up, and we caught it :)

**How to test the change?**

I've tested this release locally using the changes in
DataDog/dd-trace-rb#4577 .

As a reminder, new libdatadog releases don't get automatically picked up
by dd-trace-rb, so the PR that bumps the dependency will also test
this release against all supported Ruby versions.
@ivoanjo ivoanjo marked this pull request as ready for review May 30, 2025 11:54
@ivoanjo ivoanjo requested review from a team as code owners May 30, 2025 11:54
@pr-commenter
Copy link

pr-commenter bot commented May 30, 2025

Benchmarks

Benchmark execution time: 2025-05-30 13:30:54

Comparing candidate commit 2cf6513 in PR branch ivoanjo/prof-11524-upgrade-libdatadog17-try3 with baseline commit 2361de6 in branch master.

Found 5 performance improvements and 0 performance regressions! Performance is the same for 35 metrics, 5 unstable metrics.

scenario:profiling - gvl benchmark samples

  • 🟩 throughput [+2760.373op/s; +2769.831op/s] or [+24.717%; +24.802%]

scenario:profiling - profiler gc

  • 🟩 throughput [+70382.915op/s; +70898.293op/s] or [+23.777%; +23.951%]

scenario:profiling - sample timeline=false

  • 🟩 throughput [+1.351op/s; +1.417op/s] or [+21.912%; +22.993%]

scenario:profiling - sample+serialize retain_every=10 heap_samples=false heap_size=false heap_sample_every=1 skip_end_gc=false

  • 🟩 throughput [+0.350op/s; +0.362op/s] or [+13.846%; +14.311%]

scenario:profiling - stack collector

  • 🟩 throughput [+414.383op/s; +416.104op/s] or [+14.312%; +14.372%]

@codecov-commenter
Copy link

codecov-commenter commented May 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.63%. Comparing base (2361de6) to head (2cf6513).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4577      +/-   ##
==========================================
- Coverage   97.64%   97.63%   -0.02%     
==========================================
  Files        1470     1470              
  Lines       87707    87711       +4     
  Branches     4547     4547              
==========================================
- Hits        85644    85636       -8     
- Misses       2063     2075      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@p-datadog p-datadog merged commit 06bde33 into master May 30, 2025
454 checks passed
@p-datadog p-datadog deleted the ivoanjo/prof-11524-upgrade-libdatadog17-try3 branch May 30, 2025 19:47
@github-actions github-actions bot added this to the 2.17.0 milestone May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants