Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly handle both v1 and v2 span-ids. #1

Merged
merged 1 commit into from
Oct 4, 2024
Merged

Properly handle both v1 and v2 span-ids. #1

merged 1 commit into from
Oct 4, 2024

Conversation

csilvers
Copy link
Member

Summary:

The SPAN_ID is a decimal number in v1 of the trace-context API,
but a (16-digit) hexadecimal number in v2:
https://cloud.google.com/python/docs/reference/cloudtrace/1.2.0/google.cloud.trace_v1.types.TraceSpan
https://cloud.google.com/python/docs/reference/cloudtrace/1.2.0/google.cloud.trace_v2.types.Span

Stackdriver uses only v2 these days:
https://issuetracker.google.com/issues/338634230?pli=1

Unfortunately, the X-Cloud-Trace-Context may still be using the
v1 API. We try to detect that situation and convert it to a v2
compatible trace-id by formatting the number in hex.

We cannot perfectly detect whether we're v1 or v2, because
there are some decimal numbers that look like 16-digit hex
numbers. But not very many; we'd expect the logic below to
make a mistake about 1 in every 2,000 log entries. In such
cases, we will use the wrong span-id, making it harder to
associate log-messages with the requestlog that spawned it.

Issue: https://console.cloud.google.com/support/cases/detail/v2/53379415?project=khan-academy

Test plan:

Fingers crossed, it can't be any worse than what we have now!

The SPAN_ID is a decimal number in v1 of the trace-context API,
but a (16-digit) hexadecimal number in v2:
   https://cloud.google.com/python/docs/reference/cloudtrace/1.2.0/google.cloud.trace_v1.types.TraceSpan
   https://cloud.google.com/python/docs/reference/cloudtrace/1.2.0/google.cloud.trace_v2.types.Span

Stackdriver uses only v2 these days:
   https://issuetracker.google.com/issues/338634230?pli=1

Unfortunately, the X-Cloud-Trace-Context may still be using the
v1 API.  We try to detect that situation and convert it to a v2
compatible trace-id by formatting the number in hex.

We cannot perfectly detect whether we're v1 or v2, because
there are some decimal numbers that look like 16-digit hex
numbers.  But not very many; we'd expect the logic below to
make a mistake about 1 in every 2,000 log entries.  In such
cases, we will use the wrong span-id, making it harder to
associate log-messages with the requestlog that spawned it.

Issue: https://console.cloud.google.com/support/cases/detail/v2/53379415?project=khan-academy

Test plan:
Fingers crossed, it can't be any worse than what we have now!
@csilvers csilvers self-assigned this Sep 19, 2024
@csilvers csilvers requested a review from a team September 19, 2024 23:01
func maybeFixSpanID(s string) string {
// We say it's a v2 span-id if it's the right length and is
// obviously a hex value. Otherwise we assume it's a v1 span id.
// About 0.0005 of v2 span-ids will fail this check (because the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you determine 1 in every 2,000 log entries?

Copy link
Member

@MiguelCastillo MiguelCastillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!! I wonder what the long term plan is.

Should we send a PR to https://github.com/andyfusniak/stackdriver-gae-logrus-plugin?

And is google taking any action for them to fix this? Like should we have a TODO somewhere?

@csilvers
Copy link
Member Author

Should we send a PR to https://github.com/andyfusniak/stackdriver-gae-logrus-plugin?

Already done: andyfusniak#2

And is google taking any action for them to fix this? Like should we have a TODO somewhere?

I'm still following up with them. Last I heard, at least some people believe the current behavior is a bug -- perhaps specific to us -- and they're looking into it. I'm going to wait to hear back from them before landing this.

@csilvers
Copy link
Member Author

Google now has an official bug entry: googleapis/google-cloud-go#10910. Will wait to see how that goes before landing this.

@MiguelCastillo
Copy link
Member

@csilvers im happy to land this and we can leave a TODO to clean up once the dust has settled. If anything, it would be a short term fix. Up to you tho. Sadly we dont have a sync with google this week.

@csilvers csilvers merged commit f3a78e6 into master Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants