Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inject trace context into EventBridge detail #7613

Merged
merged 10 commits into from
Oct 4, 2024

Conversation

nhulston
Copy link
Contributor

@nhulston nhulston commented Sep 12, 2024

What Does This Do

This creates a new instrumentation for EventBridge and intercepts PutEventsRequest to inject trace context. This allows the agent to combine spans from a distributed (serverless) architecture into a single trace.

This PR only injects trace context. I'm working on PR 1 and PR 2 to update the Lambda extension to use this trace context to create EventBridge spans.

I am working on similar PRs in dd-trace-dotnet and dd-trace-go.

Motivation

SNS and SQS are already supported, and the tracer currently injects trace context into message attributes fields for them. However, EventBridge wasn't supported, and this PR aims to fix this problem.

Additional Notes

Overall, AWS's EventBridge API is lacking some features, so we have to do some hacky solutions.

  • SNS and SQS call their custom input field messageAttributes, and EventBridge calls it detail
  • Unlike SNS and SQS, the detail field is given as a raw string. Therefore, we have to manually modify the detail string using StringBuilder.
  • The agent has no reliable way of getting the start time of the EventBridge span, so the tracer has to put the current time into detail[_datadog] as x-datadog-start-time
  • The EventBridge API has no way of getting the EventBridge bus name, so the tracer has to put the bus name (which is used to create the span resource name) into detail[_datadog] as x-datadog-resource-name

Traces before these changes

Lambda --> EventBridge --> Lambda
Two different traces. Second trace is missing an EventBridge span
Screenshot 2024-09-12 at 4 50 56 PM

Lambda --> EventBridge --> SQS --> Lambda
Missing EventBridge span
Screenshot 2024-09-12 at 4 51 14 PM

Lambda --> EventBridge --> SNS --> Lambda
Missing EventBridge span
Screenshot 2024-09-12 at 4 51 06 PM

Traces after these (and agent's) changes

Lambda --> EventBridge --> Lambda
Screenshot 2024-09-12 at 4 53 26 PM

Lambda --> EventBridge --> SQS --> Lambda
Screenshot 2024-09-12 at 4 53 18 PM

Lambda --> EventBridge --> SNS --> Lambda
Screenshot 2024-09-12 at 4 53 22 PM

Contributor Checklist

Jira ticket: SVLS-5666

@pr-commenter
Copy link

pr-commenter bot commented Sep 13, 2024

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master nicholas.hulston/lambda-eventbridge-tracing-fix
git_commit_date 1727968048 1727977992
git_commit_sha 85b316b 41eed19
release_version 1.41.0-SNAPSHOT~85b316b1c0 1.41.0-SNAPSHOT~41eed19728
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1727980294 1727980294
ci_job_id 660707120 660707120
ci_pipeline_id 45746714 45746714
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 52 metrics, 11 unstable metrics.

Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.41.0-SNAPSHOT~41eed19728, baseline=1.41.0-SNAPSHOT~85b316b1c0

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.068 s) : 0, 1068430
Total [baseline] (8.596 s) : 0, 8595617
Agent [candidate] (1.067 s) : 0, 1066815
Total [candidate] (8.546 s) : 0, 8545761
section iast
Agent [baseline] (1.196 s) : 0, 1195874
Total [baseline] (9.1 s) : 0, 9099753
Agent [candidate] (1.194 s) : 0, 1194246
Total [candidate] (9.08 s) : 0, 9080195
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.196 s) : 0, 1196213
Total [baseline] (9.106 s) : 0, 9106375
Agent [candidate] (1.195 s) : 0, 1195181
Total [candidate] (9.128 s) : 0, 9128418
section iast_TELEMETRY_OFF
Agent [baseline] (1.194 s) : 0, 1193988
Total [baseline] (9.071 s) : 0, 9070844
Agent [candidate] (1.201 s) : 0, 1201008
Total [candidate] (9.102 s) : 0, 9102333
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.068 s -
Agent iast 1.196 s 127.444 ms (11.9%)
Agent iast_HARDCODED_SECRET_DISABLED 1.196 s 127.784 ms (12.0%)
Agent iast_TELEMETRY_OFF 1.194 s 125.558 ms (11.8%)
Total tracing 8.596 s -
Total iast 9.1 s 504.136 ms (5.9%)
Total iast_HARDCODED_SECRET_DISABLED 9.106 s 510.758 ms (5.9%)
Total iast_TELEMETRY_OFF 9.071 s 475.227 ms (5.5%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.067 s -
Agent iast 1.194 s 127.432 ms (11.9%)
Agent iast_HARDCODED_SECRET_DISABLED 1.195 s 128.366 ms (12.0%)
Agent iast_TELEMETRY_OFF 1.201 s 134.193 ms (12.6%)
Total tracing 8.546 s -
Total iast 9.08 s 534.433 ms (6.3%)
Total iast_HARDCODED_SECRET_DISABLED 9.128 s 582.656 ms (6.8%)
Total iast_TELEMETRY_OFF 9.102 s 556.572 ms (6.5%)
gantt
    title insecure-bank - break down per module: candidate=1.41.0-SNAPSHOT~41eed19728, baseline=1.41.0-SNAPSHOT~85b316b1c0

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (680.728 ms) : 0, 680728
BytebuddyAgent [candidate] (681.261 ms) : 0, 681261
GlobalTracer [baseline] (311.182 ms) : 0, 311182
GlobalTracer [candidate] (309.953 ms) : 0, 309953
AppSec [baseline] (54.466 ms) : 0, 54466
AppSec [candidate] (53.632 ms) : 0, 53632
Remote Config [baseline] (681.174 µs) : 0, 681
Remote Config [candidate] (664.592 µs) : 0, 665
Telemetry [baseline] (7.752 ms) : 0, 7752
Telemetry [candidate] (7.65 ms) : 0, 7650
section iast
BytebuddyAgent [baseline] (796.802 ms) : 0, 796802
BytebuddyAgent [candidate] (795.468 ms) : 0, 795468
GlobalTracer [baseline] (299.425 ms) : 0, 299425
GlobalTracer [candidate] (299.257 ms) : 0, 299257
AppSec [baseline] (53.73 ms) : 0, 53730
AppSec [candidate] (54.575 ms) : 0, 54575
IAST [baseline] (24.54 ms) : 0, 24540
IAST [candidate] (23.601 ms) : 0, 23601
Remote Config [baseline] (623.725 µs) : 0, 624
Remote Config [candidate] (620.152 µs) : 0, 620
Telemetry [baseline] (7.079 ms) : 0, 7079
Telemetry [candidate] (7.048 ms) : 0, 7048
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (796.531 ms) : 0, 796531
BytebuddyAgent [candidate] (795.639 ms) : 0, 795639
GlobalTracer [baseline] (299.617 ms) : 0, 299617
GlobalTracer [candidate] (299.346 ms) : 0, 299346
AppSec [baseline] (54.261 ms) : 0, 54261
AppSec [candidate] (55.699 ms) : 0, 55699
IAST [baseline] (24.426 ms) : 0, 24426
IAST [candidate] (23.002 ms) : 0, 23002
Remote Config [baseline] (612.262 µs) : 0, 612
Remote Config [candidate] (621.951 µs) : 0, 622
Telemetry [baseline] (7.056 ms) : 0, 7056
Telemetry [candidate] (7.141 ms) : 0, 7141
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (795.094 ms) : 0, 795094
BytebuddyAgent [candidate] (799.986 ms) : 0, 799986
GlobalTracer [baseline] (298.893 ms) : 0, 298893
GlobalTracer [candidate] (301.783 ms) : 0, 301783
AppSec [baseline] (54.516 ms) : 0, 54516
AppSec [candidate] (53.672 ms) : 0, 53672
IAST [baseline] (24.208 ms) : 0, 24208
IAST [candidate] (24.158 ms) : 0, 24158
Remote Config [baseline] (614.171 µs) : 0, 614
Remote Config [candidate] (631.496 µs) : 0, 631
Telemetry [baseline] (6.921 ms) : 0, 6921
Telemetry [candidate] (6.937 ms) : 0, 6937
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.41.0-SNAPSHOT~41eed19728, baseline=1.41.0-SNAPSHOT~85b316b1c0

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.065 s) : 0, 1065240
Total [baseline] (10.403 s) : 0, 10402574
Agent [candidate] (1.075 s) : 0, 1074500
Total [candidate] (10.407 s) : 0, 10406623
section appsec
Agent [baseline] (1.207 s) : 0, 1207202
Total [baseline] (10.659 s) : 0, 10658688
Agent [candidate] (1.205 s) : 0, 1204802
Total [candidate] (10.67 s) : 0, 10670419
section iast
Agent [baseline] (1.199 s) : 0, 1199009
Total [baseline] (10.878 s) : 0, 10877782
Agent [candidate] (1.203 s) : 0, 1203318
Total [candidate] (10.872 s) : 0, 10871976
section profiling
Agent [baseline] (1.264 s) : 0, 1264318
Total [baseline] (10.589 s) : 0, 10588713
Agent [candidate] (1.269 s) : 0, 1269140
Total [candidate] (10.632 s) : 0, 10632324
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.065 s -
Agent appsec 1.207 s 141.963 ms (13.3%)
Agent iast 1.199 s 133.769 ms (12.6%)
Agent profiling 1.264 s 199.078 ms (18.7%)
Total tracing 10.403 s -
Total appsec 10.659 s 256.114 ms (2.5%)
Total iast 10.878 s 475.208 ms (4.6%)
Total profiling 10.589 s 186.139 ms (1.8%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.075 s -
Agent appsec 1.205 s 130.302 ms (12.1%)
Agent iast 1.203 s 128.818 ms (12.0%)
Agent profiling 1.269 s 194.639 ms (18.1%)
Total tracing 10.407 s -
Total appsec 10.67 s 263.796 ms (2.5%)
Total iast 10.872 s 465.353 ms (4.5%)
Total profiling 10.632 s 225.701 ms (2.2%)
gantt
    title petclinic - break down per module: candidate=1.41.0-SNAPSHOT~41eed19728, baseline=1.41.0-SNAPSHOT~85b316b1c0

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (680.386 ms) : 0, 680386
BytebuddyAgent [candidate] (686.181 ms) : 0, 686181
GlobalTracer [baseline] (309.253 ms) : 0, 309253
GlobalTracer [candidate] (312.246 ms) : 0, 312246
AppSec [baseline] (53.744 ms) : 0, 53744
AppSec [candidate] (54.081 ms) : 0, 54081
Remote Config [baseline] (669.392 µs) : 0, 669
Remote Config [candidate] (659.727 µs) : 0, 660
Telemetry [baseline] (7.597 ms) : 0, 7597
Telemetry [candidate] (7.616 ms) : 0, 7616
section appsec
BytebuddyAgent [baseline] (702.756 ms) : 0, 702756
BytebuddyAgent [candidate] (699.985 ms) : 0, 699985
GlobalTracer [baseline] (308.532 ms) : 0, 308532
GlobalTracer [candidate] (307.395 ms) : 0, 307395
AppSec [baseline] (162.89 ms) : 0, 162890
AppSec [candidate] (162.95 ms) : 0, 162950
Remote Config [baseline] (649.062 µs) : 0, 649
Remote Config [candidate] (649.841 µs) : 0, 650
Telemetry [baseline] (8.6 ms) : 0, 8600
Telemetry [candidate] (9.596 ms) : 0, 9596
IAST [baseline] (20.168 ms) : 0, 20168
IAST [candidate] (21.084 ms) : 0, 21084
section iast
BytebuddyAgent [baseline] (798.969 ms) : 0, 798969
BytebuddyAgent [candidate] (802.282 ms) : 0, 802282
GlobalTracer [baseline] (299.877 ms) : 0, 299877
GlobalTracer [candidate] (301.311 ms) : 0, 301311
AppSec [baseline] (54.484 ms) : 0, 54484
AppSec [candidate] (55.487 ms) : 0, 55487
Remote Config [baseline] (611.943 µs) : 0, 612
Remote Config [candidate] (605.689 µs) : 0, 606
Telemetry [baseline] (7.023 ms) : 0, 7023
Telemetry [candidate] (7.082 ms) : 0, 7082
IAST [baseline] (24.362 ms) : 0, 24362
IAST [candidate] (22.795 ms) : 0, 22795
section profiling
ProfilingAgent [baseline] (96.448 ms) : 0, 96448
ProfilingAgent [candidate] (96.848 ms) : 0, 96848
BytebuddyAgent [baseline] (673.994 ms) : 0, 673994
BytebuddyAgent [candidate] (677.174 ms) : 0, 677174
GlobalTracer [baseline] (392.816 ms) : 0, 392816
GlobalTracer [candidate] (393.651 ms) : 0, 393651
AppSec [baseline] (54.422 ms) : 0, 54422
AppSec [candidate] (54.695 ms) : 0, 54695
Remote Config [baseline] (669.302 µs) : 0, 669
Remote Config [candidate] (657.938 µs) : 0, 658
Telemetry [baseline] (7.429 ms) : 0, 7429
Telemetry [candidate] (7.469 ms) : 0, 7469
Profiling [baseline] (96.472 ms) : 0, 96472
Profiling [candidate] (96.872 ms) : 0, 96872
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2024-10-03T18:02:02 2024-10-03T18:08:53
git_branch master nicholas.hulston/lambda-eventbridge-tracing-fix
git_commit_date 1727968048 1727977992
git_commit_sha 85b316b 41eed19
release_version 1.41.0-SNAPSHOT~85b316b1c0 1.41.0-SNAPSHOT~41eed19728
start_time 2024-10-03T18:01:49 2024-10-03T18:08:40
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1727979278 1727979278
ci_job_id 660707121 660707121
ci_pipeline_id 45746714 45746714
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 16 unstable metrics.

Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.41.0-SNAPSHOT~41eed19728, baseline=1.41.0-SNAPSHOT~85b316b1c0
    dateFormat X
    axisFormat %s
section baseline
no_agent (372.346 µs) : 352, 393
.   : milestone, 372,
iast (479.337 µs) : 458, 501
.   : milestone, 479,
iast_FULL (555.71 µs) : 535, 577
.   : milestone, 556,
iast_GLOBAL (504.205 µs) : 483, 525
.   : milestone, 504,
iast_HARDCODED_SECRET_DISABLED (485.47 µs) : 464, 507
.   : milestone, 485,
iast_INACTIVE (451.36 µs) : 430, 473
.   : milestone, 451,
iast_TELEMETRY_OFF (476.385 µs) : 455, 498
.   : milestone, 476,
tracing (442.133 µs) : 422, 463
.   : milestone, 442,
section candidate
no_agent (371.394 µs) : 351, 391
.   : milestone, 371,
iast (483.586 µs) : 462, 505
.   : milestone, 484,
iast_FULL (553.857 µs) : 533, 575
.   : milestone, 554,
iast_GLOBAL (501.108 µs) : 480, 523
.   : milestone, 501,
iast_HARDCODED_SECRET_DISABLED (486.174 µs) : 465, 507
.   : milestone, 486,
iast_INACTIVE (445.89 µs) : 425, 467
.   : milestone, 446,
iast_TELEMETRY_OFF (474.36 µs) : 453, 496
.   : milestone, 474,
tracing (442.225 µs) : 422, 462
.   : milestone, 442,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 372.346 µs [351.881 µs, 392.811 µs] -
iast 479.337 µs [458.094 µs, 500.58 µs] 106.991 µs (28.7%)
iast_FULL 555.71 µs [534.5 µs, 576.921 µs] 183.365 µs (49.2%)
iast_GLOBAL 504.205 µs [483.106 µs, 525.304 µs] 131.859 µs (35.4%)
iast_HARDCODED_SECRET_DISABLED 485.47 µs [464.168 µs, 506.773 µs] 113.124 µs (30.4%)
iast_INACTIVE 451.36 µs [429.886 µs, 472.835 µs] 79.015 µs (21.2%)
iast_TELEMETRY_OFF 476.385 µs [454.631 µs, 498.138 µs] 104.039 µs (27.9%)
tracing 442.133 µs [421.681 µs, 462.586 µs] 69.788 µs (18.7%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 371.394 µs [351.33 µs, 391.459 µs] -
iast 483.586 µs [462.294 µs, 504.877 µs] 112.191 µs (30.2%)
iast_FULL 553.857 µs [532.718 µs, 574.996 µs] 182.463 µs (49.1%)
iast_GLOBAL 501.108 µs [479.682 µs, 522.534 µs] 129.714 µs (34.9%)
iast_HARDCODED_SECRET_DISABLED 486.174 µs [465.155 µs, 507.193 µs] 114.78 µs (30.9%)
iast_INACTIVE 445.89 µs [424.997 µs, 466.783 µs] 74.496 µs (20.1%)
iast_TELEMETRY_OFF 474.36 µs [453.164 µs, 495.556 µs] 102.966 µs (27.7%)
tracing 442.225 µs [422.03 µs, 462.419 µs] 70.83 µs (19.1%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.41.0-SNAPSHOT~41eed19728, baseline=1.41.0-SNAPSHOT~85b316b1c0
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.349 ms) : 1329, 1369
.   : milestone, 1349,
appsec (1.704 ms) : 1681, 1726
.   : milestone, 1704,
appsec_no_iast (1.723 ms) : 1699, 1747
.   : milestone, 1723,
iast (1.478 ms) : 1456, 1501
.   : milestone, 1478,
profiling (1.503 ms) : 1477, 1529
.   : milestone, 1503,
tracing (1.465 ms) : 1440, 1489
.   : milestone, 1465,
section candidate
no_agent (1.336 ms) : 1317, 1354
.   : milestone, 1336,
appsec (1.717 ms) : 1693, 1741
.   : milestone, 1717,
appsec_no_iast (1.704 ms) : 1680, 1729
.   : milestone, 1704,
iast (1.465 ms) : 1442, 1487
.   : milestone, 1465,
profiling (1.467 ms) : 1443, 1490
.   : milestone, 1467,
tracing (1.468 ms) : 1443, 1493
.   : milestone, 1468,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.349 ms [1.329 ms, 1.369 ms] -
appsec 1.704 ms [1.681 ms, 1.726 ms] 354.757 µs (26.3%)
appsec_no_iast 1.723 ms [1.699 ms, 1.747 ms] 374.237 µs (27.7%)
iast 1.478 ms [1.456 ms, 1.501 ms] 129.325 µs (9.6%)
profiling 1.503 ms [1.477 ms, 1.529 ms] 154.518 µs (11.5%)
tracing 1.465 ms [1.44 ms, 1.489 ms] 115.679 µs (8.6%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.336 ms [1.317 ms, 1.354 ms] -
appsec 1.717 ms [1.693 ms, 1.741 ms] 381.641 µs (28.6%)
appsec_no_iast 1.704 ms [1.68 ms, 1.729 ms] 368.851 µs (27.6%)
iast 1.465 ms [1.442 ms, 1.487 ms] 129.218 µs (9.7%)
profiling 1.467 ms [1.443 ms, 1.49 ms] 131.196 µs (9.8%)
tracing 1.468 ms [1.443 ms, 1.493 ms] 132.373 µs (9.9%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master nicholas.hulston/lambda-eventbridge-tracing-fix
git_commit_date 1727968048 1727977992
git_commit_sha 85b316b 41eed19
release_version 1.41.0-SNAPSHOT~85b316b1c0 1.41.0-SNAPSHOT~41eed19728
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1727979821 1727979821
ci_job_id 660707122 660707122
ci_pipeline_id 45746714 45746714
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant appsec appsec

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.41.0-SNAPSHOT~41eed19728, baseline=1.41.0-SNAPSHOT~85b316b1c0
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.064 s) : 15064000, 15064000
.   : milestone, 15064000,
appsec (15.24 s) : 15240000, 15240000
.   : milestone, 15240000,
iast (18.927 s) : 18927000, 18927000
.   : milestone, 18927000,
iast_GLOBAL (17.899 s) : 17899000, 17899000
.   : milestone, 17899000,
profiling (15.34 s) : 15340000, 15340000
.   : milestone, 15340000,
tracing (15.359 s) : 15359000, 15359000
.   : milestone, 15359000,
section candidate
no_agent (15.099 s) : 15099000, 15099000
.   : milestone, 15099000,
appsec (15.023 s) : 15023000, 15023000
.   : milestone, 15023000,
iast (19.032 s) : 19032000, 19032000
.   : milestone, 19032000,
iast_GLOBAL (18.018 s) : 18018000, 18018000
.   : milestone, 18018000,
profiling (15.11 s) : 15110000, 15110000
.   : milestone, 15110000,
tracing (15.214 s) : 15214000, 15214000
.   : milestone, 15214000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.064 s [15.064 s, 15.064 s] -
appsec 15.24 s [15.24 s, 15.24 s] 176.0 ms (1.2%)
iast 18.927 s [18.927 s, 18.927 s] 3.863 s (25.6%)
iast_GLOBAL 17.899 s [17.899 s, 17.899 s] 2.835 s (18.8%)
profiling 15.34 s [15.34 s, 15.34 s] 276.0 ms (1.8%)
tracing 15.359 s [15.359 s, 15.359 s] 295.0 ms (2.0%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.099 s [15.099 s, 15.099 s] -
appsec 15.023 s [15.023 s, 15.023 s] -76.0 ms (-0.5%)
iast 19.032 s [19.032 s, 19.032 s] 3.933 s (26.0%)
iast_GLOBAL 18.018 s [18.018 s, 18.018 s] 2.919 s (19.3%)
profiling 15.11 s [15.11 s, 15.11 s] 11.0 ms (0.1%)
tracing 15.214 s [15.214 s, 15.214 s] 115.0 ms (0.8%)
Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.41.0-SNAPSHOT~41eed19728, baseline=1.41.0-SNAPSHOT~85b316b1c0
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.465 ms) : 1453, 1476
.   : milestone, 1465,
appsec (2.316 ms) : 2275, 2356
.   : milestone, 2316,
iast (2.044 ms) : 1994, 2093
.   : milestone, 2044,
iast_GLOBAL (2.106 ms) : 2054, 2157
.   : milestone, 2106,
profiling (1.926 ms) : 1885, 1966
.   : milestone, 1926,
tracing (1.899 ms) : 1860, 1937
.   : milestone, 1899,
section candidate
no_agent (1.462 ms) : 1451, 1474
.   : milestone, 1462,
appsec (2.309 ms) : 2269, 2349
.   : milestone, 2309,
iast (2.071 ms) : 2020, 2122
.   : milestone, 2071,
iast_GLOBAL (2.103 ms) : 2052, 2155
.   : milestone, 2103,
profiling (2.382 ms) : 2199, 2566
.   : milestone, 2382,
tracing (1.915 ms) : 1875, 1954
.   : milestone, 1915,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.465 ms [1.453 ms, 1.476 ms] -
appsec 2.316 ms [2.275 ms, 2.356 ms] 850.837 µs (58.1%)
iast 2.044 ms [1.994 ms, 2.093 ms] 578.828 µs (39.5%)
iast_GLOBAL 2.106 ms [2.054 ms, 2.157 ms] 640.795 µs (43.7%)
profiling 1.926 ms [1.885 ms, 1.966 ms] 460.683 µs (31.4%)
tracing 1.899 ms [1.86 ms, 1.937 ms] 433.684 µs (29.6%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.462 ms [1.451 ms, 1.474 ms] -
appsec 2.309 ms [2.269 ms, 2.349 ms] 846.89 µs (57.9%)
iast 2.071 ms [2.02 ms, 2.122 ms] 608.855 µs (41.6%)
iast_GLOBAL 2.103 ms [2.052 ms, 2.155 ms] 640.838 µs (43.8%)
profiling 2.382 ms [2.199 ms, 2.566 ms] 920.103 µs (62.9%)
tracing 1.915 ms [1.875 ms, 1.954 ms] 452.439 µs (30.9%)

Comment on lines +41 to +43
public static final String BUS_TAG = "bus";
private static final DDCache<String, String> BUS_TAG_CACHE = DDCaches.newFixedSizeCache(32);
private static final Function<String, String> BUS_TAG_PREFIX = new StringPrefix("bus:");
Copy link
Contributor Author

@nhulston nhulston Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not 100% sure what this does, but I was copying what the implementation for SNS did. Are these tags to be used by users for querying purposes, or does it do something else?

@nhulston nhulston changed the title [serverless] Inject trace context into EventBridge detail Inject trace context into EventBridge detail Sep 13, 2024
@nhulston nhulston added tag: serverless Serverless support inst: eventbridge EventBridge instrumentation type: feature request labels Sep 13, 2024
@nhulston nhulston marked this pull request as ready for review September 13, 2024 20:54
@nhulston nhulston requested review from a team as code owners September 13, 2024 20:54
@nhulston nhulston requested a review from ygree September 13, 2024 20:54
@nhulston nhulston force-pushed the nicholas.hulston/lambda-eventbridge-tracing-fix branch from 00be0b8 to 0aa00d0 Compare September 24, 2024 18:33
Copy link
Contributor

@purple4reina purple4reina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

@nhulston nhulston force-pushed the nicholas.hulston/lambda-eventbridge-tracing-fix branch from 739383e to a409a15 Compare September 25, 2024 20:41
@PerfectSlayer
Copy link
Contributor

Quick question about the new label: is EventBridge falls under AWS SDK?

Copy link
Contributor

@ygree ygree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nhulston
Copy link
Contributor Author

Quick question about the new label: is EventBridge falls under AWS SDK?

Yes it does

@nhulston nhulston merged commit 8be300e into master Oct 4, 2024
103 checks passed
@nhulston nhulston deleted the nicholas.hulston/lambda-eventbridge-tracing-fix branch October 4, 2024 15:01
@github-actions github-actions bot added this to the 1.41.0 milestone Oct 4, 2024
nhulston added a commit to DataDog/dd-trace-dotnet that referenced this pull request Oct 15, 2024
…ext (#6096)

## Summary of changes
This creates a new instrumentation for EventBridge and intercepts
`PutEvents` and `PutEventsAsync` to inject trace context. This allows
the agent to combine spans from a distributed (serverless) architecture
into a single trace.

This PR only injects trace context. I'm working on [PR
1](DataDog/datadog-agent#29414) and [PR
2](DataDog/datadog-agent#29551) to update the
Lambda extension to use this trace context to create EventBridge spans.

I am also working on a similar PR in
[dd-trace-java](DataDog/dd-trace-java#7613) and
dd-trace-go.

## Reason for change

SNS and SQS are already supported, and the tracer currently injects
trace context into message attributes fields for them. However,
EventBridge wasn't supported, and this PR aims to fix this problem.

## Implementation details

I followed the
[documentation](https://github.com/DataDog/dd-trace-dotnet/blob/master/docs/development/AutomaticInstrumentation.md)
to create an instrumentation. Much of the logic was mirrored from the
[existing
implementation](https://github.com/DataDog/dd-trace-dotnet/tree/master/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AWS/SNS)
of SNS, since EventBridge and SNS are extremely similar.

Overall, AWS's EventBridge API is lacking some features, so we have to
do some hacky solutions.

- SNS and SQS call their custom input field messageAttributes, and
EventBridge calls it detail
- Unlike SNS and SQS, the detail field is given as a raw string.
Therefore, we have to manually modify the detail string using
StringBuilder.
- The agent has no reliable way of getting the start time of the
EventBridge span, so the tracer has to put the current time into
`detail[_datadog]` under the header `x-datadog-start-time`
- The EventBridge API has no way of getting the EventBridge bus name, so
the tracer has to put the bus name (which is used to create the span
resource name) into `detail[_datadog]` under the header
`x-datadog-resource-name`


## Test coverage

I added system tests for SNS/SQS:
DataDog/system-tests#3204

I added [unit
tests](d05eb4c)
and [integration
tests](5ccd8b7).

Unit tests can be ran with:
```
cd tracer
dotnet test ./test/Datadog.Trace.ClrProfiler.Managed.Tests
```

Integration tests can be ran with these commands:
```
cd tracer

# Start docker localstock
docker run --rm -it -p 4566:4566 -p 4571:4571 -e SERVICES=events localstack/localstack

# Run integation tests
./build.sh BuildAndRunOSxIntegrationTests -buildConfiguration Debug -framework net6.0 -Filter AwsEventBridgeTests -SampleName Samples.AWS.EventBridge
```

I also did manual testing:
<img width="505" alt="Screenshot 2024-09-30 at 11 00 47 AM"
src="https://github.com/user-attachments/assets/bdf5d516-8b46-4138-ac25-c45d1822dc56">

## Other details

There are lots of diffs and files changed. I recommend reviewers to
review the PR commit by commit. All the autogenerated files were added
in a single commit, which should make the review process less
overwhelming.

<!-- ⚠️ Note: where possible, please obtain 2 approvals prior to
merging. Unless CODEOWNERS specifies otherwise, for external teams it is
typically best to have one review from a team member, and one review
from apm-dotnet. Trivial changes do not require 2 reviews. -->

---------

Co-authored-by: Steven Bouwkamp <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inst: eventbridge EventBridge instrumentation tag: serverless Serverless support type: feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants