Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specific URL query string values should be redacted #971

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

trask
Copy link
Member

@trask trask commented Apr 26, 2024

Related to #860

Alternative proposal to #961

Changes

Specific URL query string values should now be redacted by default due to concerns around leaking credentials.

The list of problematic query string keys is documented in the semantic conventions and is subject to change over time as we learn about additional common usages of credentials found in query string values.

@austinlparker
Copy link
Member

Should we also add a few 'obvious' ones? e.g., 'password', 'secret', 'private', etc.?

@trask
Copy link
Member Author

trask commented Apr 26, 2024

here's the default list used by Elastic APM: https://www.elastic.co/guide/en/apm/agent/java/1.x/config-core.html#config-sanitize-field-names

Copy link
Contributor

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to let users/sdks/distros to expand the blocklist via configuration, but this would be a purely incremental change.

@austinlparker
Copy link
Member

here's the default list used by Elastic APM: https://www.elastic.co/guide/en/apm/agent/java/1.x/config-core.html#config-sanitize-field-names

I do find it ironic that there's a big "query string values are considered non-sensitive" info box in that section.

But yeah, that's a good list of additions too.

Copy link
Member

@tylerbenson tylerbenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be easily configured in a standard way so users can add more to it.

I'm also surprised that this list is as short as it is.

@gregkalapos
Copy link
Member

I'd definitely prefer this over #961, since this is less intrusive for users who already capture query strings and unlike #961 this really handles values that are problematic from security point of view. With #961 the only chance for people to capture query strings is to turn it back on, but that again would capture problematic values as well. So I definitely prefer an approach that tries to handle problematic values over an all or nothing config.

Since configuration came up: I think this list could be generalized and applied to multiple other things (e.g. Db query parameters, Elastic also applies this to x-form encoded request bodies (not yet part of SemConv, but could be), and there are probably many other attribtues). So the config could be a list of keys that are applied for sanitization to every attribute that can potentially contain sensitive information, not only for query strings. That could be also introduced incrementally; for now I think what the PR proposes is ok.

@reyang
Copy link
Member

reyang commented Apr 26, 2024

I'm also surprised that this list is as short as it is.

If we look at what the user provided in the original report, here are some keys:

It is important to ensure that sensitive query parameters are scrubbed by default. While this list is not exhaustive, the following parameters should be scrubbed as a starting point: "sig", "code", "key", "access_token", "client_secret", "password", "appkey", "app_key", "apikey", "apiKey", "api_key", "accesskey", "access_key", and "token".

image

@trask
Copy link
Member Author

trask commented Apr 26, 2024

There are many potential candidates for additional query string keys to include.

From initial CVE report against OpenTelemetry .Net:

  • code
  • key
  • access_token
  • client_secret
  • password
  • appkey
  • app_key
  • apikey
  • apiKey
  • api_key
  • accesskey
  • access_key
  • token

From https://www.elastic.co/guide/en/apm/agent/java/1.x/config-core.html#config-sanitize-field-names:

  • password
  • passwd
  • pwd
  • secret
  • *key
  • *token*
  • *session*
  • *credit*
  • *card*
  • *auth*
  • *principal*
  • set-cookie

From a process perspective of how to pick and choose (both now and in the future), it might be nice to say something along the lines of

our default redaction list only includes known query string keys used to pass credentials to known public rest APIs

* [`sig`](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#sas-token)
* [`X-Goog-Signature`](https://cloud.google.com/storage/docs/access-control/signed-urls)

This list is subject to grow over time, but once a key is added to the list, removing it will be considered a
Copy link
Member

@reyang reyang Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would these be considered as breaking changes?
For example, items in the current list have links to the official doc, one can imagine over years some of these technologies will be abandoned and docs would be retired.

@svrnm
Copy link
Member

svrnm commented Apr 29, 2024

I prefer this solution over #961, 👍

Question: In my comment and in @austinlparker's comment and in @reyang's comment (in chronological order), we discussed the idea of having generalized SDK capabilities for redaction, is this still something we want? If so I am happy to work towards a proposal for that.

@pyohannes
Copy link
Contributor

we discussed the idea of having generalized SDK capabilities for redaction, is this still something we want? If so I am happy to work towards a proposal for that.

I'd definitely answer that with a yes. Users might want to have redaction capabilities they can use for data that doesn't conform to OTel semantic conventions.

There have been related previous attempts:

@TylerHelmuth
Copy link
Member

our default redaction list only includes known query string keys used to pass credentials to known public rest APIs

I agree with adding this verbiage as it helps clarify how the list was decided.

@svrnm
Copy link
Member

svrnm commented Apr 29, 2024

As discussed during the SemConv SIG meeting I am going to provide a proposal for the "general problem", I will try to have a draft by the end of this week to share

@svrnm
Copy link
Member

svrnm commented Apr 30, 2024

As discussed during the SemConv SIG meeting I am going to provide a proposal for the "general problem", I will try to have a draft by the end of this week to share

Here's the proposal: open-telemetry/oteps#255

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label May 16, 2024
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this May 24, 2024
@trask trask reopened this May 24, 2024
@trask trask removed Stale labels May 24, 2024
Copy link

github-actions bot commented Jun 9, 2024

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 9, 2024
@lmolkova lmolkova removed the Stale label Jun 9, 2024
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 25, 2024
@joaopgrassi joaopgrassi added never stale PRs marked with this label will be never staled and automatically closed and removed Stale labels Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
never stale PRs marked with this label will be never staled and automatically closed
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.