Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace IDs with leading zeros are trimmed in the Search API response #4723

Open
kaustubhkurve opened this issue Feb 19, 2025 · 4 comments
Open

Comments

@kaustubhkurve
Copy link

Describe the bug

Summary
When a trace with leading zeros in the trace ID (ex: 008efff798038103d269b633813fc703) is ingested to Tempo, the trace ID in the search API results is trimmed to omit the leading zeros.

Description
The TraceIDToHexString function is being used to generate the string representation for the trace ID and this removes the leading zeros in the trace ID. It looks like this function is being used in a few places including:

  1. the parquet module to generate TraceID Text (the trace ID bytestring in the corresponding column in parquet contains the leading zeros)
  2. the traceql engine module to generate the trace search metadata.

We have observed that for trace IDs with leading zeros, the search API response includes the trimmed trace IDs instead of the full trace ID (16 byte hexstring). The trace ID API succeeds with either the trimmed trace ID or the zero prefixed trace IDs.

Because of the trimming of leading zeros, it's not straightforward to correlate trace IDs with external sources as the API consumer has to left pad the trimmed IDs to make them compatible with the trace context spec.

We noticed that a similar fix was rolled out for span IDs.

To Reproduce
Steps to reproduce the behavior:

  1. Ingest a trace to Tempo with leading zeros in trace ID
{
  "resourceSpans": [
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "my.service"
            }
          }
        ]
      },
      "scopeSpans": [
        {
          "scope": {
            "name": "my.library",
            "version": "1.0.0",
            "attributes": [
              {
                "key": "my.scope.attribute",
                "value": {
                  "stringValue": "some scope attribute"
                }
              }
            ]
          },
          "spans": [
            {
              "traceId": "008efff798038103d269b633813fc703",
              "spanId": "eee19b7ec3c1b100",
              "name": "I am another span!",
              "startTimeUnixNano": 1689969302000000000,
              "endTimeUnixNano": 1689970000000000000,
              "kind": 2,
              "attributes": [
                {
                  "key": "my.span.attr",
                  "value": {
                    "stringValue": "some value"
                  }
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}
  1. Use the search API to retrieve the trace
{
  "traces": [
    {
      "traceID": "8efff798038103d269b633813fc703",
      "rootServiceName": "my.service",
      "rootTraceName": "I am another span!",
      "startTimeUnixNano": "1689969302000000000",
      "durationMs": 698000
    }
  ],
  "metrics": {
    "inspectedTraces": 1,
    "inspectedBytes": "17227",
    "completedJobs": 1,
    "totalJobs": 1
  }
}

Expected behavior
Tempo should return a 16 byte hexstring trace ID in the search API response.

Environment:

  • Infrastructure: Kubernetes / laptop
  • Deployment tool: helm / docker-compose

Additional Context

@joe-elliott
Copy link
Member

I'm not sure I see the issue here. Left padding a string with 0s doesn't feel like it adds any value? The numerical values are equivalent.

@kaustubhkurve
Copy link
Author

kaustubhkurve commented Feb 20, 2025

Hi @joe-elliott, to elaborate further on the issue here:

  1. On the instrumentation layer, we collect additional metadata for a trace ID, and we store this metadata along with the traceID in a different store.
  2. When the traceID generated by the instrumentation layer has leading zeros, there is a mismatch in the traceID returned by tempo search API and what is stored in the metadata sink (ex: 8efff798038103d269b633813fc703 returned by tempo vs 008efff798038103d269b633813fc703 in the external store).
  3. Since leading zeros are removed in the search API response, the traceID cannot be easily correlated with external sources (direct lookups).

In such cases, the search API consumer would need pad the trace ID to be able to correlate with other sources.

Note: A similar issue was raised for span IDs and a fix was made to pad the span IDs to make sure returned span IDs are 16 hex characters long (relevant comment).

Can we have the same approach for trace IDs as well so they are formatted uniformly according to the spec (32 hex character long)?

@joe-elliott
Copy link
Member

We are going to leave this the way it is for now. Both leading 0s and not are equivalent and correct responses. if we add leading 0s it's possible someone will file an issue asking to remove them to match the particular needs of their client. In the case of spans we added leading 0s b/c it solved an issue with Grafana.

Thanks for bringing this need to our attention, but we will leave it alone.

@kaustubhkurve
Copy link
Author

kaustubhkurve commented Feb 21, 2025

@joe-elliott I am trying to understand how this is being considered a client need and not an issue of conforming to a standard/spec.

The trace ID spec in the context of interoperability with non-compliant systems, states padding TraceIDs with 0s to ensure they are compliant with the standard.

When a system creates an outbound message and needs to generate a fully compliant 16 bytes trace-id from a shorter identifier, it SHOULD left pad the original identifier with zeroes. For example, the identifier 53ce929d0e0e4736, SHOULD be converted to trace-id value 000000000000000053ce929d0e0e4736.

The OpenTelemetry Tracing API spec (https://opentelemetry.io/docs/specs/otel/trace/api/#retrieving-the-traceid-and-spanid) also states the following:

The API MUST allow retrieving the TraceId and SpanId in the following forms:
Hex - returns the lowercase hex encoded TraceId (result MUST be a 32-hex-character lowercase string) or SpanId (result MUST be a 16-hex-character lowercase string).
Binary - returns the binary representation of the TraceId (result MUST be a 16-byte array) or SpanId (result MUST be an 8-byte array).

In Tempo's case, a fully compliant trace ID is being sent to tempo, however it's not being returned in the search API to spec.

From the comment on the spanID issue:

SpanIDs are 8 bytes/64 bits of information, but this new method should handle cases where the slice has a shorter length. The output result should always be 16 hex characters long.

Trace IDs are 16 byte array or 32 character hex string (https://www.w3.org/TR/trace-context/#trace-id), and I am wondering why they can't be represented with the similar standards that spanIDs are treated with.

Both leading 0s and not are equivalent

On the numerical equivalence of two values, the API is returning a string representation (not a numerical representation), so I am not sure if they can be considered equivalent.

if we add leading 0s it's possible someone will file an issue asking to remove them to match the particular needs of their client

If this does happen, they can be redirected to the trace context spec, indicating that trace IDs are 32 character hex strings.

I am trying to understand how a spec can be enforced for spanIDs (because it solves an issue with grafana), and not for traceIDs.

I am open to working on a PR if there is consensus.

Thanks again for following up on the issue, appreciate your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants