-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTLP over message systems #157
base: main
Are you sure you want to change the base?
Changes from all commits
78c3eb7
dfd49e5
6cd071c
57451c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# OTLP over messaging systems | ||
|
||
Best practices for transporting OTLP data over a message system. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have any performance best practices for message systems based on the existing Collector Kafka implementation? For example should you prefer to batch into many small messages or into fewer large messages? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would not make it part of the spec, just like it is not part of the OTLP/gRPC spec. |
||
|
||
## Motivation | ||
|
||
This proposal tries to bring consistency and guidelines when transporting | ||
OTLP data over messaging systems. A non-exclusive list of examples | ||
of products in this category are: | ||
|
||
* Apache Kafka | ||
* Apache Pulsar | ||
* Google Pubsub | ||
* AWS SNS | ||
* RabbitMQ | ||
|
||
Using an intermediate messaging system is in some cases preferred compared to | ||
a direct network connection between the telemetry producer and consumer. | ||
Reasons for using such an intermediate medium could be: | ||
|
||
* Protecting against backend failure | ||
* Security Policies | ||
* Network Policies | ||
* Buffering | ||
|
||
An extra motivation to have a consistent definition of the payload is that | ||
it would be easy to transfer the OTLP data from one messaging system to | ||
another just by reading the payload from one system and writing it to | ||
another without the need for transformations. | ||
|
||
## Explanation | ||
|
||
Because the OTLP payload that is sent over messaging systems is | ||
well-defined, it’s easy to implement new systems consistently. | ||
An implementation should at least support `otlp_proto`, meaning that the | ||
payload is Protobuf serialized from `ExportTraceServiceRequest` for traces, | ||
`ExportMetricsServiceRequest` for metrics, or `ExportLogsServiceRequest` for | ||
logs. | ||
|
||
Optionally an implementation could support `otlp_json` or other alternative | ||
encodings like jaeger or zipkin payloads. Alternative encodings should be configured | ||
by an `encoding` field in the configuration file. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What configuration file is this referring to? Does this assume that every implementation of OTLP over messaging system must have a configuration file? It seems like a stretch to me. Can it be a in-memory config option? Or an API parameter of a library that implements this OTEP? |
||
|
||
For the user it’s beneficial that both an exporter, and a receiver are | ||
implemented in the OpenTelemetry Collector, but SDK developers could | ||
also implement a language specific exporter. | ||
|
||
For log data, implementors of a receiver are encouraged to also support | ||
receiving raw data from a topic. This enables scenarios where | ||
non-OpenTelemetry log producers can produce a stream of logs, either | ||
structured or unstructured, on that topic and the receiver wraps the log | ||
data in a configurable resource. | ||
|
||
## Internal details | ||
|
||
The default implementation must implement `otlp_proto`, meaning that the | ||
payload is Protobuf serialized from `ExportTraceServiceRequest` for traces, | ||
`ExportMetricsServiceRequest` for metrics, or `ExportLogsServiceRequest` for | ||
Comment on lines
+56
to
+58
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One thing that is not clear to me is if we want to include the signal type in the serialized payload so that the recipient can tell what signal is. Since this OTEP does not define it it means the recipient that works with multiple signals needs another way to figure it out (often ends up using the channel name). I think this is an important capability that adds value and has no obvious downsides. This is also an issue we want to solve for OTLP/File format: open-telemetry/opentelemetry-specification#1443 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't help with the file format at all, but Kafka and Pubsub (maybe others) have key-value headers/attributes which are mentioned in "Open questions". The signal type and encoding (json or proto) could go in there There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added a comment in the OTLP/File discussion about CloudEvents. Certainly, if you want to allow multiplexing logs, traces, and metrics over a single topic CloudEvents is a good fit and it would align with this OTLP as well as the spec defines bindings for different messaging systems: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @alexvanboxel I read the comment you left in the OTLP/File issue but I am a bit confused why it talks about the CloudEvents or Avro. My comment here is about how an OTLP payload that is encoded as a Protobuf or JSON should be recorded in messaging systems such that the signal type information is included. I think we should keep it as simple as possible, new any additional concepts and technology introduced (CloudEvents or Avro) IMO is unnecessary complication. If we want to record the type of the signal there are fairly ways to achieve it while staying with the existing set of technologies. For example we can add this message to OTLP Protobuf definitions: // This may be also named simply Request or Payload or something like that
message ExportServiceRequest {
oneof request {
ExportTraceServiceRequest traces = 1;
ExportMetricsServiceRequest metrics = 2;
ExportLogsServiceRequest logs = 3;
}
} ExportServiceRequest can be then serialized and become the payload in the messaging system. Recipients can then deserialize and will be able to use the right signal in the "request" field. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tigrannajaryan the consensus is that we want to signal the type of the message, so I'm already changing my application. Let me demonstrate what I mean by leveraging the CloudEvent spec (as it's a spec and not technology). In my Google Pubsub implementation, it describes what headers to use. So I could simple use: func (receiver *pubsubReceiver) detectEncoding(attributes map[string]string) (Encoding, error) {
ceType := attributes["ce-type"]
ceContentType := attributes["ce-datacontenttype"]
if strings.HasSuffix(ceContentType,"application/x-protobuf") {
switch ceType {
case "org.opentelemetry.otlp.traces.v1":
return OtlpProtoTrace, nil
case "org.opentelemetry.otlp.metrics.v1":
return OtlpProtoMetric, nil
case "org.opentelemetry.otlp.logs.v1":
return OtlpProtoLog, nil
}
} else if strings.HasSuffix(ceContentType,"application/json") {
switch ceType {
case "org.opentelemetry.otlp.traces.v1":
return OtlpJsonTrace, nil
case "org.opentelemetry.otlp.metrics.v1":
return OtlpJsonMetric, nil
case "org.opentelemetry.otlp.logs.v1":
return OtlpJsonLog, nil
}
}
return Unknown, nil
} Granted, it's heavier than a |
||
logs. If an implementation support other encodings an `encoding` field should | ||
be added to the configuration to make it switchable. | ||
|
||
| value | trace | metric | log | description | | ||
|---|---|---|---|---| | ||
| `otlp_proto` (default) | X | X | X | protobuf serialized from the `Export(Trace/Metric/Log)ServiceRequest` | | ||
| `otlp_json` | X | X | X | proto3 json representation of a `Export(Trace/Metric/Log)ServiceRequest` | | ||
| `jaeger_proto` | X | - | - | the payload is deserialized to a single Jaeger proto `Span` | | ||
| `jaeger_json` | X | - | - | the payload is deserialized to a single Jaeger JSON Span using `jsonpb` | | ||
| `zipkin_proto` | X | - | - | the payload is deserialized into a list of Zipkin proto spans | | ||
| `zipkin_json` | X | - | - | the payload is deserialized into a list of Zipkin V2 JSON spans | | ||
| `zipkin_thrift` | X | - | - | the payload is deserialized into a list of Zipkin Thrift spans | | ||
Comment on lines
+66
to
+70
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are unrelated to OTLP. I think the shouldn't be in this OTEP. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will remove it, but I'll leave it here to give other people a chance to comment.
Comment on lines
+66
to
+70
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm proposing changes to these in PR linked above. The types would be something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'm going to remove the other encoding out of the spec as @tigrannajaryan suggests |
||
| `raw_string` | - | - | X | see `Log specific collector receiver` | | ||
| `raw_binary` | - | - | X | see `Log specific collector receiver` | | ||
|
||
Above you’ll find a non-exclusive list of possible encodings. Only `otlp_proto` | ||
is mandatory and the default. | ||
|
||
### Log specific collector receiver | ||
|
||
As the flow control is the hardest part of the implementations, adding a feature | ||
for reading raw log events avoids having an additional parallel | ||
implementation to do just that. | ||
|
||
The encoding field should be used to indicate that the data received is not the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is meant by "field" here? |
||
default `otlp_proto` data, but raw log data. The receiver should construct a valid | ||
OTLP message for each raw message received. Valid encoding are `raw_string` | ||
and `raw_binary`, that will control the type that the data will have when set | ||
in the `body` of the OTLP message. | ||
|
||
As the raw log data don’t have a resource attached to them, the receiver should add | ||
a generic resource and instrumentation library message around the raw messages. The | ||
instrumentation library should be set to the name of the receiver and the version to | ||
that of the collector. Defining the exact resource should be done in the pipeline, | ||
using for example the `resourceprocessor`. | ||
|
||
## Prior art and alternatives | ||
|
||
The `kafkareceiver` and `kafkaexporter` already implement this OTLP as | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that I'm proposing some changes to these: open-telemetry/opentelemetry-collector#3044 Still a WIP. |
||
described in the OpenTelemetry Collector. The description in the OTEP | ||
makes both of them compliant without modification. Although it | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is good. But let's also be aware this is not a must-have requirement. If we believe what Collector does is not the best we can do then we should be able to propose something different in this OTEP and fix the Collector (it is still in Beta and breaking changes are allowed). |
||
doesn't implement the raw logging support but this could be added | ||
without conflicting with the implementation. | ||
|
||
## Open questions | ||
|
||
This proposal doesn't take advantage of attributes that some systems | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 I think it would be good to put encoding and signal type in there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1. I think perhaps having a standard convention for knowing what signal is coming in (either force separate channels for each signal, OR a standard header of some sort to disambiguate) would be good. OTLP effectively has this in their GRPC definition via the separate services. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, gRPC can handle this over the same endpoint (a separate service is just a path in the HTTP/2 header). I'm leaning toward the CloudEvents proposal, where you can have multiple (at least have the possibility to) types over one topic. |
||
support. Should it? Would it be useful to look at the CloudEvents | ||
spec, to leverage the conventions for attributes? | ||
|
||
No guaranty of order could be given, because not all systems support order. | ||
Is that a problem? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should not be a problem since OTLP conceptually does not guarantee order of delivery in network protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure to rename your file to the PR # you've been assigned (0157).