You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spec adding span links for instrumentation of messaging systems (#616)
- Specify that span links should be used on "messaging" spans and
on "messaging" transactions that cover a batch of messages, when
messages include trace context.
- Drop special-casing of single-message SQS/SNS Lambda triggers.
- Specify a guard/limit of 1000 messages to check for trace-context.
- Starts documenting message metadata mechanisms to use for propagating
trace-context, for those systems that provide a facility.
Closes: #606
Co-authored-by: Felix Barnsteiner <[email protected]>
Copy file name to clipboardExpand all lines: specs/agents/tracing-instrumentation-aws-lambda.md
+31-26
Original file line number
Diff line number
Diff line change
@@ -133,31 +133,31 @@ In version 2.0, the `${event.requestContext.routeKey}` can have the format `GET
133
133
If `use_path_as_transaction_name` is applicable and set to `true`, use `${event.requestContext.http.method} ${event.requestContext.http.path}` as the transaction name.
134
134
135
135
### SQS / SNS
136
-
Lambda functions that are triggered by SQS (or SNS) accept an `event` input that may contain one or more SQS / SNS messages in the `event.records` array. All message-related context information (including the `traceparent`) is encoded in the individual message attributes (if at all). We cannot (automatically) wrap the processing of the individual messages that are sent as a batch of messages with a single `event`.
137
136
138
-
Thus, in case that an SQS / SNS`event`contains **exactly one**SQS / SNS message, the agents must apply the following, messaging-specific retrieval of information. Otherwise, the agents should apply the [Generic Lambda Instrumentation](generic-lambda-instrumentation) as described above.
137
+
Lambda functions that are triggered by SQS (or SNS) accept an `event`input that may contain one or more SQS / SNS messages in the `event.records` array. All message-related context information (including the `traceparent`) is encoded in the individual message attributes (if at all).
139
138
140
-
With only one message in `event.records`, the agents can use the single SQS / SNS `record` to retrieve the `traceparent` and `tracestate` from message attributes and use it for starting the lambda transaction.
139
+
#### SQS
141
140
142
-
In addition the following fields should be set for Lambda functions triggered by SQS or SNS:
141
+
Agents SHOULD check each record, [up to a maximum of 1000](tracing-instrumentation-messaging.md#receiving-trace-context),
142
+
for a `traceparent` message attribute, and create a [span link](span-links.md)
143
+
on the transaction for each message with trace-context.
144
+
145
+
In addition to [the generic Lambda transaction fields](#generic-lambda-instrumentation)
146
+
the following fields SHOULD be set. The use of `records[0]` below depends on the
147
+
understanding from [AWS Lambda SQS docs](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html)
148
+
that a trigger invocation can only include messages from *one* queue.
143
149
144
-
#### SQS
145
150
Field | Value | Description | Source
146
151
--- | --- | --- | ---
147
152
`type` | `messaging`| Transaction type: constant value for SQS. | -
148
-
`name` | e.g. `RECEIVE SomeQueue` | Transaction name: Follow the [messaging spec](./tracing-instrumentation-messaging.md) for transaction naming. | Simple queue name can be derived from the 6th segment of `record.eventSourceArn`.
153
+
`name` | e.g. `RECEIVE SomeQueue` | Transaction name: Follow the [messaging spec](./tracing-instrumentation-messaging.md) for transaction naming. | Simple queue name can be derived from the 6th segment of `records[0].eventSourceArn`.
149
154
`faas.trigger.type` | `pubsub` | Constant value for message based triggers | -
150
-
`faas.trigger.request_id` | e.g. `someMessageId` | SQS message ID. | `record.messageId`
151
-
`context.service.origin.name` | e.g. `my-queue` | SQS queue name | Simple queue name can be derived from the 6th segment of `record.eventSourceArn`.
152
-
`context.service.origin.id` | e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue` | SQS queue ARN. | `record.eventSourceArn`
155
+
`context.service.origin.name` | e.g. `my-queue` | SQS queue name | Simple queue name can be derived from the 6th segment of `records[0].eventSourceArn`.
156
+
`context.service.origin.id` | e.g. `arn:aws:sqs:us-east-2:123456789012:my-queue` | SQS queue ARN. | `records[0].eventSourceArn`
153
157
`context.cloud.origin.service.name` | `sqs` | Fix value for SQS. | -
154
-
`context.cloud.origin.region` | e.g. `us-east-1` | SQS queue region. | `record.awsRegion`
155
-
`context.cloud.origin.account.id` | e.g. `12345678912` | Account ID of the SQS queue. | Parse account segment (5th) from `record.eventSourceArn`.
158
+
`context.cloud.origin.region` | e.g. `us-east-1` | SQS queue region. | `records[0].awsRegion`
159
+
`context.cloud.origin.account.id` | e.g. `12345678912` | Account ID of the SQS queue. | Parse account segment (5th) from `records[0].eventSourceArn`.
156
160
`context.cloud.origin.provider` | `aws` | Use `aws` as fix value. | -
157
-
`context.message.queue.name` | e.g. `my-queue` | The SQS queue name. | The 6th segment of `record.eventSourceArn`
158
-
`context.message.age.ms` | e.g. `3298` | Age of the message in milliseconds. `current_time` - `SentTimestamp`, if SentTimestamp is available. | Message attribute with key `SentTimestamp`.
159
-
`context.message.body` | - | The message body. Should only be captured if body capturing is enabled in the configuration. | `record.body`
160
-
`context.message.headers` | - | The message attributes. Should only be captured, if capturing headers is enabled in the configuration. | Use the `stringValue` of entries in `record.messageAttributes` with `dataType == "String" || dataType == "Number"`. [Other attribute types](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-message-metadata.html#message-attribute-data-types) are ignored.
161
161
162
162
An example SQS event:
163
163
@@ -206,23 +206,28 @@ An example SQS event:
206
206
207
207
#### SNS
208
208
209
+
Agents SHOULD check each record, [up to a maximum of 1000](tracing-instrumentation-messaging.md#receiving-trace-context),
210
+
for a `traceparent` message attribute (`Records.*.Sns.MessageAttributes`), and
211
+
create a [span link](span-links.md) on the transaction for each message with
212
+
trace-context.
213
+
214
+
In addition to [the generic Lambda transaction fields](#generic-lambda-instrumentation)
215
+
the following fields should be set. The use of `records[0]` is based on the
216
+
understanding, from ["all notification messages will contain a single published
217
+
message"](https://aws.amazon.com/sns/faqs/#Reliability), that an SNS trigger
218
+
will only ever have a single record.
219
+
209
220
Field | Value | Description | Source
210
221
--- | --- | --- | ---
211
222
`type` | `messaging`| Transaction type: constant value for SNS. | -
212
-
`name` | e.g. `RECEIVE SomeTopic` | Transaction name: Follow the [messaging spec](./tracing-instrumentation-messaging.md) for transaction naming. | Simple topic name can be derived from the 6th segment of `record.sns.topicArn`.
223
+
`name` | e.g. `RECEIVE SomeTopic` | Transaction name: Follow the [messaging spec](./tracing-instrumentation-messaging.md) for transaction naming. | Simple topic name can be derived from the 6th segment of `records[0].sns.topicArn`.
213
224
`faas.trigger.type` | `pubsub` | Constant value for message based triggers | -
214
-
`faas.trigger.reuqest_id` | e.g. `someMessageId` | SNS message ID. | `record.sns.messageId`
215
-
`context.service.origin.name` | e.g. `my-topic` | SNS topic name | Simple topic name can be derived from the 6th segment of `record.sns.topicArn`.
216
-
`context.service.origin.id` | e.g. `arn:aws:sns:us-east-2:123456789012:my-topic` | SNS topic ARN. | `record.sns.topicArn`
217
-
`context.service.origin.version` | e.g. `2.1` | SNS event version | `record.eventVersion`
225
+
`context.service.origin.name` | e.g. `my-topic` | SNS topic name | Simple topic name can be derived from the 6th segment of `records[0].sns.topicArn`.
226
+
`context.service.origin.id` | e.g. `arn:aws:sns:us-east-2:123456789012:my-topic` | SNS topic ARN. | `records[0].sns.topicArn`
218
227
`context.cloud.origin.service.name` | `sns` | Fix value for SNS. | -
219
-
`context.cloud.origin.region` | e.g. `us-east-1` | SNS topic region. | Parse region segment (4th) from `record.sns.topicArn`.
220
-
`context.cloud.origin.account.id` | e.g. `12345678912` | Account ID of the SNS topic. | Parse account segment (5th) from `record.sns.topicArn`.
228
+
`context.cloud.origin.region` | e.g. `us-east-1` | SNS topic region. | Parse region segment (4th) from `records[0].sns.topicArn`.
229
+
`context.cloud.origin.account.id` | e.g. `12345678912` | Account ID of the SNS topic. | Parse account segment (5th) from `records[0].sns.topicArn`.
221
230
`context.cloud.origin.provider` | `aws` | Use `aws` as fix value. | -
222
-
`context.message.queue.name` | e.g. `my-topic` | The SNS topic name. | The 6th segment of `record.sns.topicArn`
223
-
`context.message.age.ms` | e.g. `3298` | Age of the message in milliseconds. `current_time` - `snsTimestamp`. | `record.sns.timestamp`
224
-
`context.message.body` | - | The message body. Should only be captured if body capturing is enabled in the configuration. | `record.sns.message`
225
-
`context.message.headers` | - | The message attributes. Should only be captured, if capturing headers is enabled in the configuration. | Use the `Value` of entries in `record.Sns.MessageAttributes` with `Type == "String"`. [Other attribute types](https://docs.aws.amazon.com/sns/latest/dg/sns-message-attributes.html#SNSMessageAttributes.DataTypes) are ignored. |
Copy file name to clipboardExpand all lines: specs/agents/tracing-instrumentation-messaging.md
+40-12
Original file line number
Diff line number
Diff line change
@@ -15,9 +15,9 @@ occurring within a traced transaction.
15
15
16
16

17
17
18
-
### Trace Context
18
+
### Sending Trace Context
19
19
20
-
if the messaging system exposes an API for sending additional message properties/metadata, it
20
+
If the messaging system exposes a mechanism for sending additional [message metadata](#message-metadata), it
21
21
SHOULD be used to propagate the [Trace Context](https://www.w3.org/TR/trace-context/) of the
22
22
`messaging` span, to continue the [distributed trace](tracing-distributed-tracing.md).
23
23
@@ -68,19 +68,28 @@ If creating a transaction for the processing of each message in a batch is not p
68
68
the agent SHOULD create a single `messaging` transaction for the processing of the batch
69
69
of messages.
70
70
71
-
### Trace Context
71
+
### Receiving Trace Context
72
72
73
-
When message reception is captured as a `messaging` transaction, if the messaging
74
-
system exposes an API for sending additional message properties/metadata, it
75
-
SHOULD be checked for the presence of [Trace Context](https://www.w3.org/TR/trace-context/).
76
-
If Trace Context is present, it SHOULD be propagated to the `messaging` transaction
73
+
This section applies to messaging systems that support [message metadata](#message-metadata).
74
+
The instrumentation of message reception SHOULD check message metadata for the
75
+
presence of [Trace Context](https://www.w3.org/TR/trace-context/).
76
+
77
+
When single message reception is captured as a `messaging` transaction,
78
+
and a Trace Context is present, it SHOULD be used as the parent of the `messaging` transaction
77
79
to continue the [distributed trace](tracing-distributed-tracing.md).
78
80
79
-
If a batch of messages is processed in a single `messaging` transaction, it may be
80
-
possible that each message in the batch has its own Trace Context. In this
81
-
scenario, it is not currently possible to propagate a Trace Context to the `messaging`
82
-
transaction, since there a multiple contexts present. It may be possible to capture
83
-
these in future through [span links](https://github.com/elastic/apm/issues/122).
81
+
Otherwise (a single message being captured as a `messaging` span, or a batch
82
+
of messages is processed in a single `messaging` transaction or span), a
83
+
[span link](span-links.md) SHOULD be added for each message with Trace Context.
84
+
This includes the case where the size of the batch of received messages is one.
85
+
86
+
The number of events processed for trace context SHOULD be limited to a maximum
87
+
of 1000, as a guard on agent overhead for extremely large batches of events.
88
+
(For example, SQS's maximum batch size is 10000 messages. The maximum number of
89
+
span links that could be sent for a single transaction/span to APM server with
90
+
the default configuration is approximately 4000: 307200 bytes
91
+
[APM server `max_event_size` default](https://www.elastic.co/guide/en/apm/server/current/configuration-process.html#max_event_size)
92
+
/ 77 bytes per serialized span link.)
84
93
85
94
### Examples
86
95
@@ -188,6 +197,25 @@ queues/topics/exchanges will be ignored.
188
197
| Central config |`true`|
189
198
190
199
200
+
### Message metadata
201
+
202
+
To support distributed tracing with automatic instrumentation, the messaging
203
+
system must provide a mechanism to add metadata/properties/attributes to
204
+
individual messages, akin to HTTP headers. If an APM agent supports
205
+
trace-context for a given messaging system, it MUST use the following mechanisms
206
+
so that cross-language tracing works:
207
+
208
+
| Messaging system | Mechanism |
209
+
| ---------------------- | --------- |
210
+
| Azure Queue | No mechanism |
211
+
| Azure Service Bus | Possibly `Diagnostic-Id`[application property](https://docs.microsoft.com/en-us/dotnet/api/azure.messaging.servicebus.servicebusmessage.applicationproperties). See [this doc](https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-end-to-end-tracing). |
212
+
| Java Messaging Service |[Message Properties](https://docs.oracle.com/javaee/7/api/javax/jms/Message.html)|
213
+
| Apache Kafka |[Kafka Record headers](https://cwiki.apache.org/confluence/display/KAFKA/KIP-82%2B-%2BAdd%2BRecord%2BHeaders) using [binary trace context fields](tracing-distributed-tracing.md#binary-fields)|
214
+
| RabbitMQ |[Message Attributes](https://www.rabbitmq.com/tutorials/amqp-concepts.html#messages) (a.k.a. `AMQP.BasicProperties` in [Java API](https://www.rabbitmq.com/api-guide.html)) |
215
+
| AWS SQS |[SQS message attributes](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-message-metadata.html), if within message attribute limits. See [AWS instrumentation](tracing-instrumentation-aws.md). |
216
+
| AWS SNS |[SNS message attributes](https://docs.aws.amazon.com/sns/latest/dg/sns-message-attributes.html), if within message attribute limits. See [AWS instrumentation](tracing-instrumentation-aws.md). |
217
+
218
+
191
219
### AWS messaging systems
192
220
193
221
The instrumentation of [SQS](tracing-instrumentation-aws.md#sqs-simple-queue-service) and [SNS](tracing-instrumentation-aws.md#sns-aws-simple-notification-service) services generally follow this spec, with some nuances specified in the linked specs.
0 commit comments