-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly handle both v1 and v2 span-ids. #1
Conversation
The SPAN_ID is a decimal number in v1 of the trace-context API, but a (16-digit) hexadecimal number in v2: https://cloud.google.com/python/docs/reference/cloudtrace/1.2.0/google.cloud.trace_v1.types.TraceSpan https://cloud.google.com/python/docs/reference/cloudtrace/1.2.0/google.cloud.trace_v2.types.Span Stackdriver uses only v2 these days: https://issuetracker.google.com/issues/338634230?pli=1 Unfortunately, the X-Cloud-Trace-Context may still be using the v1 API. We try to detect that situation and convert it to a v2 compatible trace-id by formatting the number in hex. We cannot perfectly detect whether we're v1 or v2, because there are some decimal numbers that look like 16-digit hex numbers. But not very many; we'd expect the logic below to make a mistake about 1 in every 2,000 log entries. In such cases, we will use the wrong span-id, making it harder to associate log-messages with the requestlog that spawned it. Issue: https://console.cloud.google.com/support/cases/detail/v2/53379415?project=khan-academy Test plan: Fingers crossed, it can't be any worse than what we have now!
func maybeFixSpanID(s string) string { | ||
// We say it's a v2 span-id if it's the right length and is | ||
// obviously a hex value. Otherwise we assume it's a v1 span id. | ||
// About 0.0005 of v2 span-ids will fail this check (because the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you determine 1 in every 2,000 log entries
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!! I wonder what the long term plan is.
Should we send a PR to https://github.com/andyfusniak/stackdriver-gae-logrus-plugin?
And is google taking any action for them to fix this? Like should we have a TODO somewhere?
Already done: andyfusniak#2
I'm still following up with them. Last I heard, at least some people believe the current behavior is a bug -- perhaps specific to us -- and they're looking into it. I'm going to wait to hear back from them before landing this. |
Google now has an official bug entry: googleapis/google-cloud-go#10910. Will wait to see how that goes before landing this. |
@csilvers im happy to land this and we can leave a TODO to clean up once the dust has settled. If anything, it would be a short term fix. Up to you tho. Sadly we dont have a sync with google this week. |
Summary:
The SPAN_ID is a decimal number in v1 of the trace-context API,
but a (16-digit) hexadecimal number in v2:
https://cloud.google.com/python/docs/reference/cloudtrace/1.2.0/google.cloud.trace_v1.types.TraceSpan
https://cloud.google.com/python/docs/reference/cloudtrace/1.2.0/google.cloud.trace_v2.types.Span
Stackdriver uses only v2 these days:
https://issuetracker.google.com/issues/338634230?pli=1
Unfortunately, the X-Cloud-Trace-Context may still be using the
v1 API. We try to detect that situation and convert it to a v2
compatible trace-id by formatting the number in hex.
We cannot perfectly detect whether we're v1 or v2, because
there are some decimal numbers that look like 16-digit hex
numbers. But not very many; we'd expect the logic below to
make a mistake about 1 in every 2,000 log entries. In such
cases, we will use the wrong span-id, making it harder to
associate log-messages with the requestlog that spawned it.
Issue: https://console.cloud.google.com/support/cases/detail/v2/53379415?project=khan-academy
Test plan:
Fingers crossed, it can't be any worse than what we have now!