From 798e42705e4117e1684a628952f07a26d2554fdc Mon Sep 17 00:00:00 2001 From: svrnm Date: Tue, 30 Apr 2024 13:00:41 +0200 Subject: [PATCH 1/8] Sensitive Data Redaction Signed-off-by: svrnm --- text/0000-sensitive-data-redaction.md | 269 ++++++++++++++++++++++++++ 1 file changed, 269 insertions(+) create mode 100644 text/0000-sensitive-data-redaction.md diff --git a/text/0000-sensitive-data-redaction.md b/text/0000-sensitive-data-redaction.md new file mode 100644 index 000000000..ba77050b9 --- /dev/null +++ b/text/0000-sensitive-data-redaction.md @@ -0,0 +1,269 @@ +# Sensitive Data Redaction + +This is a proposal for adding treatment of sensitive data to the OpenTelemetry (OTel) Specification and semantic conventions. + +## Motivation + +When collecting data from an application, there is always the possibility, that the data contains information that shouldn’t be collected, because it is either leaking (parts of) credentials (passwords, tokens, usernames, credit card information) or can be used to uniquely identify a person (name, IP, email, credit card information) which may be protected through certain regulations. By adding OTel to a library or instrumentation an end-user of OTel is facing exactly this challenge: values of an [Attribute](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/README.md#attribute) may carry such sensitive data. + +While it’s ultimately the responsibility of the legal entity operation an application to protect sensitive data, end-users of OpenTelemetry (developers, operators working for that entity) are turning to the authors of OpenTelemetry – or to those of libraries that implement OpenTelemetry, like the Azure SDK – to have means in place to redact/filter sensitive data. Without that capability provided, they will raise security issues and/or will drop OpenTelemetry eventually due to it not meeting their security/legal requirements. + +In this OTEP you will find a proposal for adding treatment of sensitive data to OpenTelemetry. + +By adding the proposed features, OpenTelemetry will provide its end-users the tooling needed to make sure that sensitive data is treated according to their requirements. + +This will make sure that these end-users can use OpenTelemetry within their secure and legal requirements and that OpenTelemetry (and implementing libraries) are able to avoid vulnerabilities. + +## Explanation + +### Overview + +This proposal aims to provide the following features: + +- methods for consumers of the OpenTelemetry API to enrich collected Attributes with sensitivity information and hooks to apply different levels of redaction. +- related to those methods a similar way to enrich attributes in the semantic conventions with sensitivity information and ways of redaction. +- a consistent way to configure sensitivity requirements for end-users of the OpenTelemetry SDK and in instrumentation libraries, including predefined “configuration profiles” and ways for fine grain configuration. +- A redactor implements the logic to apply redaction and that owns predefined helpers for redaction in the SDK (URLParams filtering, Zeroing IPs, etc.). + +The following limitations apply: + +- Only Attributes are treated, although it is possible that sensitive data is contained in other data generated by OpenTelemetry as well (e.g. the span name, or the instrumentation scope name could contain sensitive data) + +### Annotate attributes with sensitivity information + +As a first building block the OpenTelemetry API needs to provide capability to enrich collected attributes with sensitivity details and with potential hooks to redact those. Those API changes then also can be used to describe sensitivity information in the semantic conventions. + +#### API + +Every API that sets an attribute consisting of a key and a value, needs to be enhanced by an additional functionality that allows to add details about the potential sensitivity of this data and a hooks how it may be redacted. This can be an additional set of parameters for an existing method or a method that can be called after the attribute has been set, if adding parameters to a method signature would lead to a breaking change. + +An additional method for setting the sensitivity information for a span attribute might look like the following: + +``` +span.setAttribute("url.query", url.toString(), ); +``` + +Or, if the setAttribute method can not be extended: + +``` +span.setAttribute("url.query", url.toString()); +span.redactAttribute(“url.query”, ); +``` + +The content of `` may look like the following example: + +``` +{ + : +} +``` + +The key `` is one of multiple available pre-defined levels of redaction requirements that an end-user may choose, e.g. + +- `DEFAULT`: What should be the default redaction applied to this value +- `STRICT`: What should be a basic redaction that is applied to this value +- `STRICTER`: What should be an advanced redaction that is applied to this value +- … + +The value `` can be one of the following: + +- A regular expression (or sed expression?) which when applied turns parts of a string into their redacted version,e.g. `s/([0-9]+\.[0-9]+\.[0-9]+\.)[0-9]+/\10/` applied on an IP address will replace the last octet with 0: `1.2.3.4` becomes `1.2.3.0` +- A constant that represents pre-defined redaction methods (see below for Redaction helpers), e.g. `REDACT_INSECURE_URL_PARAM_VALUES` will apply what is required for [#971](https://github.com/open-telemetry/semantic-conventions/pull/971) +- A callback that will call a function on that value and apply advanced redaction + +It should be recommended to either use the expression or the constant and only in rare circumstances a callback function should be applied. + +Note that those API calls are no-op and will be implemented by the SDK (as we do it with other API methods as well), this way (almost?) no additional overhead will be created by introducing those annotations. + +### Semantic Conventions + +With the definitions above values in the semantic conventions can be annotated with `` as outlined above, with the exception that no callback functions can be supplied as redaction rules. + +Example: + +| Attribute | ... existing columns ... | sensitivity details | +|-----------|--------------------------|---------------------| +| `url.query`| | Rationale: Some verbatim wording why this is the way it is below
Type: `mixed`
`DEFAULT`: `REDACT_INSECURE_URL_PARAM_VALUES`
STRICT: `REDACT_ALL_URL_VALUES`
`STRICTER`: `DROP` | +| `client.address`| | Rationale: some reasons why dropping octets may be required
Type: `maybe_pii`
`DEFAULT`: `NONE`
`STRICT`: `'s/([0-9]+\.[0-9]+\.[0-9]+\.)[0-9]+/\10/'`
`STRICTER`: `'s/([0-9]+\.[0-9]+\.)[0-9]+\.[0-9]+/\10.0/'` | +| `enduser.creditCardNumber`**[1]** | | Rationale: ...
Type: `always_pii`
DEFAULT: `EXTRACT_IIN`
`STRICT`: `DROP`| + +**[1]**: _This is a made-up example for demonstration purpose, it’s not part of the current semantic conventions. It gives a more nuanced example, e.g. that extracting the IIN might be an option over dropping the number completely. It also demonstrated the value of “type”, which can enable Data lineage use cases_ + +The `DROP` keyword means that the value is replaced with `REDACTED` (or set to null, …). + +It is the responsibility of the OpenTelemetry SDK to implement those sensitivity details provided by the semantic conventions. This means that an instrumentation library does not need to add `` when calling an `setAttribute` method or does not need to call the additional `redactAttribute` method as outlined above. An instrumentation library may choose to apply additional redactions (leveraging the OpenTelemetry APIs or doing it before calling `setAttribute` in their own business logic) + +### SDK Configuration of sensitivity requirements + +The annotations and APIs as outlined above will allow SDK users to provide their sensitive requirements as configuration (here: environment variable, but can be encoded in future config files as well), e.g. + +``` +env OTEL_SENSITIVE_DATA_PROFILE=”STRICT” ./myInstrumentedApp +``` + +This call will make sure that redactions applied follow the `STRICT` profile. If not set the `DEFAULT` will be used. Additionally there are 2 levels that can not be used in sensitivity details: + +- `NEVER`: No redaction is applied, SDKs may choose to log a warning that this is a risky choice +- `ALWAYS`: All values with sensitivity details will be dropped + +Additionally SDK end users can provide advanced configuration (through code, configuration file, probably not environment variable) to add specific needs: + +``` +{ + => +} +``` + +e.g. + +``` +{ + "url.query" => { DEFAULT: REDACT_ALL_URL_VALUES }, + "com.example.customer.name" => { DEFAULT: x => { /* do something in this callback * } }, + "com.example.customer.email" => { ‘s/^[^@]*@/REDACTED@/’ }, +} +``` + +### Redactor + +To accomplish redaction the SDK needs a component (similar to a sampler) that inspects attributes when they are set and applies the required redactions: + +- `Redactor::setup(profile)` will setup the redactor with the given profile, maybe a constructor? depends on the language +- `Redactor::redact(value, )` will return the redacted value using the provided profile + +The redactor will also have all the methods to apply predefined redactions (`REDACT_ALL_URL_VALUES`, `REDACT_INSECURE_URL_PARAM_VALUES`, `DROP`, etc.). If a method is not implemented (either by the SDK or by the end-user choosing one that does not exist), it will default to apply `DROP` to avoid leakage of any sensitive data. + +## Internal details + +**tbd** + +## Additional context + +Treating sensitive data properly is a very complex multi-dimensional topic. Below you will find some additional context on that subject. + +### Types of sensitive data + +There are different kinds of “sensitivity” that may apply to data. The ones most relevant in this proposal are “security” and “privacy”. They may overlap but we distinguish them as follows: + +- security-relevant sensitive data: any information that when exposed [weakens the overall security of the device/system](https://en.wikipedia.org/wiki/Vulnerability_(computing)). +- privacy-relevant sensitive data: any information that when exposed [can adversely affect the privacy or welfare of an individual](https://en.wikipedia.org/wiki/Information_sensitivity). + +Note, that there are other kinds of sensitive data (business information, classified information), which are not covered extensively by this proposal. + +### Level of Sensitivity + +The level of sensitivity of an information can also be different and that sensitivity can be contextual, e.g. + +- The password of a user without privileges is less sensitive than the password of an administrator +- The "client IP" in a server-to-server communication is less sensitive than the "client IP" in an client-to-server communication, where the client can be linked to a human. +- API tokens of a demo system are less sensitive than API tokens for a production system +- The license plate of an individual’s car is less sensitive than their social security number +- The full name of a user in a social network is less sensitive than the full name of a user in a medical research database + +Depending on the sensitivity data an end-user of an observability system may weigh up if collecting this data is worth it. + +### Regulatory and other requirements + +Due to the negative effects that the exposure of sensitive data can have (see above in "Types of sensitive data"), different entities have created regulations for the collection of sensitive data, among them: + +- GDPR +- CPRA +- PIPEDA +- HIPAA +- [more…](https://en.wikipedia.org/wiki/Information_privacy) + +Additionally the entities operating the applications who leverage OpenTelemetry may have their own requirements for treating certain sensitive data. + +Finally end-users may want to apply recommendations for [Data Minimization](https://en.wikipedia.org/wiki/Data_minimization), to avoid "unnecessary risks for the data subject". + +**Note 1**: it is not (and can not be) the responsibility of the OpenTelemetry project to provide compliance with any of those regulations, this is a responsibility of the OTel end-user. OTel can only facilitate parts of those requirements. + +**Note 2**: Those requirements are subject of change and outside of the control of the OpenTelemetry community. + +## Trade-offs and mitigations + +### Performance Impact + +By adding an extra layer of processing every time an attribute value gets set, might have an impact on the performance. There might be ways to reduce that overhead, e.g. by only redacting values which are finalized and ready to exported such that updated values or sampled data does not need to be handled. + +## Prior art and alternatives + +### Spec & SemConv Issues + +This problem has been discussed multiple times, here is a list of existing issues in the OpenTelemetry repositories: + +- [Http client and server span default collection behavior for url.full and url.query attributes](https://github.com/open-telemetry/semantic-conventions/issues/860) +- [URL query string values should be redacted by default](https://github.com/open-telemetry/semantic-conventions/pull/961) +- [Specific URL query string values should be redacted](https://github.com/open-telemetry/semantic-conventions/pull/971) +- [Allow url.path sanitization](https://github.com/open-telemetry/semantic-conventions/pull/676) +- [Guidance requested: static SQL queries may contain sensitive values](https://github.com/open-telemetry/semantic-conventions/issues/436) +- [Semantic conventions vs GDPR](https://github.com/open-telemetry/semantic-conventions/issues/128) +- [Guidelines for redacting sensitive information](https://github.com/open-telemetry/semantic-conventions/issues/877) +- [DB sanitization uniform format](https://github.com/open-telemetry/semantic-conventions/issues/717) +- [Add db.statement sanitization/masking examples](https://github.com/open-telemetry/semantic-conventions/issues/708) +- [TC Feedback Request: View attribute filter definition in Go](https://github.com/open-telemetry/opentelemetry-specification/issues/3664) + +### SemConv Pages + +The semantic conventions already contains notes around treating sensitive data (search for "sensitive" on the linked pages if not stated otherwise): + +- [gRPC SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/rpc/grpc.md) +- [Sensitive Information in URL SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/url/url.md#sensitive-information) +- [GraphQL Spans SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/graphql/graphql-spans.md) +- [Container Resource SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/resource/container.md) +- [Database Spans SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/database-spans.md) +- [HTTP Spans SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-spans.md) +- [General Attributes SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attributes.md) +- [LLM Spans SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/llm-spans.md) +- [ElasticSearch SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/elasticsearch.md) +- [Redis SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/redis.md) +- [Connect RPC SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/rpc/connect-rpc.md) +- [Device SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/resource/device.md) (search for GDPR) + +### Existing Solutions within OpenTelemetry + +The following solutions for OpenTelemetry already exist: + +- [MrAlias/redact](https://github.com/MrAlias/redact) for OpenTelemetry Go +- Collector processors, including + - [Redaction Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/redactionprocessor) + - [Transform Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor) + - [Filter Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/filterprocessor) + +### Alternative 0: Do nothing + +It’s always good to analyze this option. If we do nothing end users will still need to satisfy their requirements for treating sensitive data accordingly: + +- Instrumentation library authors are required to manage redaction before using the OpenTelemetry API +- Application developers will do the same or are forced to join the instrumented application with a collector for redaction/filtering +- Third party solutions will be implemented. + +### Alternative 1: OpenTelemetry Collector + +As listed above there are multiple processors available that can be used to redact or filter sensitive data with the OpenTelemetry collector. The challenge with that is that it is unknown to an application (owner) if data is processed in the collector as expected. Also, the data leaving the application might already be a risk (non-encrypted or compromised network) or may not be allowed (collector is hosted in a different country, which may conflict with a regulation) + +Ideally a combination is used. + +### Alternative 2: Backend + +The backend consuming the OpenTelemetry data can provide processing for filtering and redaction as well. The same objection as for the collector apply. + +### Existing Solutions outside OpenTelemetry + +There are many solutions outside OpenTelemetry that help to filter or redact sensitive data based on security and privacy requirements: + +- [sanitize_field_name in Elastic Java](https://www.elastic.co/guide/en/apm/agent/java/1.x/config-core.html#config-sanitize-field-names) +- [Filter sensitive data in AppDynamics Java Agent](https://docs.appdynamics.com/appd/24.x/24.4/en/application-monitoring/install-app-server-agents/java-agent/administer-the-java-agent/filter-sensitive-data) +- [GA4 data redaction](https://support.google.com/analytics/answer/13544947?sjid=3336918779004544977-EU) +- [Configure Privacy Settings in Matamo](https://matomo.org/faq/general/configure-privacy-settings-in-matomo/) +- [DataDog Sensitive Data Redaction](https://docs.datadoghq.com/observability_pipelines/sensitive_data_redaction/) + +## Open questions + +- **Question 1**: Should sensitivity details for an attribute in the semantic conventions be excluded from the stability guarantees? This means, updating them for a **stable** attribute is not a breaking change. The idea behind excluding them from the stability guarantees is that the requirements are subject of change due to changes in technology (see [#971](https://github.com/open-telemetry/semantic-conventions/pull/971), the list of query string values will evolve over time) or changes in regulatory requirements or both. + +## Future possibilities + +Attributes are most likely to carry sensitive information, but as stated in the overview section of the explanation other user-set properties may carry sensitive information as well. In a later iteration we might want to review them as well. + +The proposal puts the configuration of sensitivity requirements into the hands of the person operating an application. In a future iteration we can look into providing end-users of instrumented applications to provide their _consent_ of which and how data related to them is tracked, see [Do Not Track](https://en.wikipedia.org/wiki/Do_Not_Track), [Global Privacy Control](https://privacycg.github.io/gpc-spec/) and the requirements of certain local regulations. From 885bbb9556179db7ba04512ff1c3b91f7daad96f Mon Sep 17 00:00:00 2001 From: svrnm Date: Thu, 2 May 2024 15:33:24 +0200 Subject: [PATCH 2/8] Add OTEPs, add more details on 'automatic' redaction Signed-off-by: svrnm --- text/0000-sensitive-data-redaction.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/text/0000-sensitive-data-redaction.md b/text/0000-sensitive-data-redaction.md index ba77050b9..c7839bd97 100644 --- a/text/0000-sensitive-data-redaction.md +++ b/text/0000-sensitive-data-redaction.md @@ -75,6 +75,8 @@ It should be recommended to either use the expression or the constant and only i Note that those API calls are no-op and will be implemented by the SDK (as we do it with other API methods as well), this way (almost?) no additional overhead will be created by introducing those annotations. +Below you will find details on how the `` are included in the definitions of attributes in the Semantic Conventions. By pre-loading the details for known attributes in the SDK configuration a call to `span.setAttribute("url.query", url.toString());` can apply the redaction internally. An end user may append/overwrite those details. + ### Semantic Conventions With the definitions above values in the semantic conventions can be annotated with `` as outlined above, with the exception that no callback functions can be supplied as redaction rules. @@ -91,7 +93,7 @@ Example: The `DROP` keyword means that the value is replaced with `REDACTED` (or set to null, …). -It is the responsibility of the OpenTelemetry SDK to implement those sensitivity details provided by the semantic conventions. This means that an instrumentation library does not need to add `` when calling an `setAttribute` method or does not need to call the additional `redactAttribute` method as outlined above. An instrumentation library may choose to apply additional redactions (leveraging the OpenTelemetry APIs or doing it before calling `setAttribute` in their own business logic) +It is the responsibility of the OpenTelemetry SDK to implement those sensitivity details provided by the semantic conventions. This means that an instrumentation library does not need to add `` when calling an `setAttribute` method or does not need to call the additional `redactAttribute` method as outlined above. An instrumentation library may choose to apply additional redactions (leveraging the OpenTelemetry APIs or doing it before calling `setAttribute` in their own business logic). ### SDK Configuration of sensitivity requirements @@ -188,6 +190,11 @@ By adding an extra layer of processing every time an attribute value gets set, m ## Prior art and alternatives +### OTEPS + +- [OTEP 100 - Sensitive Data Handling](https://github.com/open-telemetry/oteps/pull/100) +- [OTEP 187 - Data Classification for resources and attributes](https://github.com/open-telemetry/oteps/pull/187) + ### Spec & SemConv Issues This problem has been discussed multiple times, here is a list of existing issues in the OpenTelemetry repositories: From 87fe972d452918feaf693ae97b7061e4d46f0276 Mon Sep 17 00:00:00 2001 From: Severin Neumann Date: Fri, 3 May 2024 09:32:12 +0200 Subject: [PATCH 3/8] Apply suggestions from code review Co-authored-by: Reiley Yang Co-authored-by: Yuri Shkuro --- text/0000-sensitive-data-redaction.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-sensitive-data-redaction.md b/text/0000-sensitive-data-redaction.md index c7839bd97..a63351be7 100644 --- a/text/0000-sensitive-data-redaction.md +++ b/text/0000-sensitive-data-redaction.md @@ -4,9 +4,9 @@ This is a proposal for adding treatment of sensitive data to the OpenTelemetry ( ## Motivation -When collecting data from an application, there is always the possibility, that the data contains information that shouldn’t be collected, because it is either leaking (parts of) credentials (passwords, tokens, usernames, credit card information) or can be used to uniquely identify a person (name, IP, email, credit card information) which may be protected through certain regulations. By adding OTel to a library or instrumentation an end-user of OTel is facing exactly this challenge: values of an [Attribute](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/README.md#attribute) may carry such sensitive data. +When collecting data from an application, there is always the possibility, that the data contains information that shouldn't be collected, because it is either leaking (parts of) credentials (passwords, tokens, usernames, credit card information) or can be used to uniquely identify a person (name, IP, email, credit card information) which may be protected through certain regulations. By adding OTel to a library or instrumentation an end-user of OTel is facing exactly this challenge: values of an [Attribute](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/README.md#attribute) may carry such sensitive data. -While it’s ultimately the responsibility of the legal entity operation an application to protect sensitive data, end-users of OpenTelemetry (developers, operators working for that entity) are turning to the authors of OpenTelemetry – or to those of libraries that implement OpenTelemetry, like the Azure SDK – to have means in place to redact/filter sensitive data. Without that capability provided, they will raise security issues and/or will drop OpenTelemetry eventually due to it not meeting their security/legal requirements. +While it’s ultimately the responsibility of the legal entity operating an application to protect sensitive data, end-users of OpenTelemetry (developers, operators working for that entity) are turning to the authors of OpenTelemetry – or to those of libraries that implement OpenTelemetry, like the Azure SDK – to have means in place to redact/filter sensitive data. Without that capability provided, they will raise security issues and/or will drop OpenTelemetry eventually due to it not meeting their security/legal requirements. In this OTEP you will find a proposal for adding treatment of sensitive data to OpenTelemetry. From 281d6b8ecbbffed8227e6bf7df1e92e6e0a86c4f Mon Sep 17 00:00:00 2001 From: Armin Ruech <7052238+arminru@users.noreply.github.com> Date: Tue, 7 May 2024 12:13:48 +0200 Subject: [PATCH 4/8] Rename 0000-sensitive-data-redaction.md to 0255-sensitive-data-redaction.md --- ...nsitive-data-redaction.md => 0255-sensitive-data-redaction.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename text/{0000-sensitive-data-redaction.md => 0255-sensitive-data-redaction.md} (100%) diff --git a/text/0000-sensitive-data-redaction.md b/text/0255-sensitive-data-redaction.md similarity index 100% rename from text/0000-sensitive-data-redaction.md rename to text/0255-sensitive-data-redaction.md From 32f62b140a1305c8fabdf0865ecbb4dffe5171bc Mon Sep 17 00:00:00 2001 From: Severin Neumann Date: Wed, 8 May 2024 11:07:34 +0200 Subject: [PATCH 5/8] Apply suggestions from code review Co-authored-by: Joao Grassi <5938087+joaopgrassi@users.noreply.github.com> --- text/0255-sensitive-data-redaction.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/text/0255-sensitive-data-redaction.md b/text/0255-sensitive-data-redaction.md index a63351be7..5e6921889 100644 --- a/text/0255-sensitive-data-redaction.md +++ b/text/0255-sensitive-data-redaction.md @@ -35,7 +35,7 @@ As a first building block the OpenTelemetry API needs to provide capability to e #### API -Every API that sets an attribute consisting of a key and a value, needs to be enhanced by an additional functionality that allows to add details about the potential sensitivity of this data and a hooks how it may be redacted. This can be an additional set of parameters for an existing method or a method that can be called after the attribute has been set, if adding parameters to a method signature would lead to a breaking change. +Every API that sets an attribute consisting of a key and a value, needs to be enhanced by an additional functionality that allows to add details about the potential sensitivity of this data and hooks for how it may be redacted. This can be an additional set of parameters for an existing method or a method that can be called after the attribute has been set, if adding parameters to a method signature would lead to a breaking change. An additional method for setting the sensitivity information for a span attribute might look like the following: @@ -264,6 +264,7 @@ There are many solutions outside OpenTelemetry that help to filter or redact sen - [GA4 data redaction](https://support.google.com/analytics/answer/13544947?sjid=3336918779004544977-EU) - [Configure Privacy Settings in Matamo](https://matomo.org/faq/general/configure-privacy-settings-in-matomo/) - [DataDog Sensitive Data Redaction](https://docs.datadoghq.com/observability_pipelines/sensitive_data_redaction/) +- [Dynatrace data privacy and security configuration](https://docs.dynatrace.com/docs/shortlink/global-privacy-settings) ## Open questions From 95c1252e263bbcf3be61cb2fb0e5ccac0a7d4e06 Mon Sep 17 00:00:00 2001 From: svrnm Date: Wed, 8 May 2024 14:03:29 +0200 Subject: [PATCH 6/8] Rework document to address feedback Signed-off-by: svrnm --- text/0255-sensitive-data-redaction.md | 344 +++++++++++++++++--------- 1 file changed, 232 insertions(+), 112 deletions(-) diff --git a/text/0255-sensitive-data-redaction.md b/text/0255-sensitive-data-redaction.md index 5e6921889..484ff0438 100644 --- a/text/0255-sensitive-data-redaction.md +++ b/text/0255-sensitive-data-redaction.md @@ -1,18 +1,39 @@ # Sensitive Data Redaction -This is a proposal for adding treatment of sensitive data to the OpenTelemetry (OTel) Specification and semantic conventions. +This is a proposal for adding treatment of sensitive data to the OpenTelemetry +(OTel) Specification and semantic conventions. ## Motivation -When collecting data from an application, there is always the possibility, that the data contains information that shouldn't be collected, because it is either leaking (parts of) credentials (passwords, tokens, usernames, credit card information) or can be used to uniquely identify a person (name, IP, email, credit card information) which may be protected through certain regulations. By adding OTel to a library or instrumentation an end-user of OTel is facing exactly this challenge: values of an [Attribute](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/README.md#attribute) may carry such sensitive data. - -While it’s ultimately the responsibility of the legal entity operating an application to protect sensitive data, end-users of OpenTelemetry (developers, operators working for that entity) are turning to the authors of OpenTelemetry – or to those of libraries that implement OpenTelemetry, like the Azure SDK – to have means in place to redact/filter sensitive data. Without that capability provided, they will raise security issues and/or will drop OpenTelemetry eventually due to it not meeting their security/legal requirements. - -In this OTEP you will find a proposal for adding treatment of sensitive data to OpenTelemetry. - -By adding the proposed features, OpenTelemetry will provide its end-users the tooling needed to make sure that sensitive data is treated according to their requirements. - -This will make sure that these end-users can use OpenTelemetry within their secure and legal requirements and that OpenTelemetry (and implementing libraries) are able to avoid vulnerabilities. +When collecting data from an application, there is always the possibility, that +the data contains information that shouldn't be collected, because it is either +leaking (parts of) credentials (passwords, tokens, usernames, credit card information), +can be used to uniquely identify a person (name, IP, email, credit card information) +which may be protected through certain regulations. By adding OTel to a library or +instrumentation an end-user of OTel is facing exactly this challenge: emitted +telemetry may carry such sensitive data. + +While it’s ultimately the responsibility of the legal entity operating an application +to protect sensitive data, end-users of OpenTelemetry (developers, operators working +for that entity) are turning to the authors of OpenTelemetry – or to those of +libraries that implement OpenTelemetry, like the Azure SDK – to have means in +place to redact/filter sensitive data. Without that capability provided, they +will raise security issues and/or will drop OpenTelemetry eventually due to it +not meeting their security/legal requirements. + +In this OTEP you will find a proposal for adding treatment of sensitive data to +OpenTelemetry. + +By adding the proposed features, OpenTelemetry will introduce the following +principles for treating sensitive data: + +* OpenTelemetry MUST allow the end-user to meet with their security/privacy/compliance + requirements regarding the data being collected. +* OpenTelemetry MUST avoid redaction offerings that lead to bigger security issues + such as (e.g. the redaction logic is poorly + implemented, so a hacker could forge certain input to DDoS the redaction engine itself). +* OpenTelemetry SHOULD allow the telemetry data to apply different redaction + logic per telemetry pipeline/exporter in a single process. ## Explanation @@ -20,66 +41,112 @@ This will make sure that these end-users can use OpenTelemetry within their secu This proposal aims to provide the following features: -- methods for consumers of the OpenTelemetry API to enrich collected Attributes with sensitivity information and hooks to apply different levels of redaction. -- related to those methods a similar way to enrich attributes in the semantic conventions with sensitivity information and ways of redaction. -- a consistent way to configure sensitivity requirements for end-users of the OpenTelemetry SDK and in instrumentation libraries, including predefined “configuration profiles” and ways for fine grain configuration. -- A redactor implements the logic to apply redaction and that owns predefined helpers for redaction in the SDK (URLParams filtering, Zeroing IPs, etc.). +- a consistent way to configure sensitivity requirements for end-users of the + OpenTelemetry SDK and in instrumentation libraries, including predefined + “configuration profiles” and ways for fine grain configuration. +- related to those a way to enrich attributes in the semantic conventions with + sensitivity information and ways of redaction. +- methods for consumers of the OpenTelemetry API to enrich collected Attributes + with sensitivity information and hooks to apply different levels of redaction. +- A redactor implements the logic to apply redaction and that owns predefined helpers + for redaction in the SDK (URLParams filtering, Zeroing IPs, etc.). The following limitations apply: -- Only Attributes are treated, although it is possible that sensitive data is contained in other data generated by OpenTelemetry as well (e.g. the span name, or the instrumentation scope name could contain sensitive data) - -### Annotate attributes with sensitivity information - -As a first building block the OpenTelemetry API needs to provide capability to enrich collected attributes with sensitivity details and with potential hooks to redact those. Those API changes then also can be used to describe sensitivity information in the semantic conventions. - -#### API - -Every API that sets an attribute consisting of a key and a value, needs to be enhanced by an additional functionality that allows to add details about the potential sensitivity of this data and hooks for how it may be redacted. This can be an additional set of parameters for an existing method or a method that can be called after the attribute has been set, if adding parameters to a method signature would lead to a breaking change. - -An additional method for setting the sensitivity information for a span attribute might look like the following: - -``` -span.setAttribute("url.query", url.toString(), ); -``` - -Or, if the setAttribute method can not be extended: - -``` -span.setAttribute("url.query", url.toString()); -span.redactAttribute(“url.query”, ); -``` - -The content of `` may look like the following example: - -``` -{ - : -} +- Although it is possible that sensitive data is contained in any data generated + by OpenTelemetry (e.g. the span name, or the instrumentation scope name could + contain sensitive data), only the following is treated in the scope of this PR: + [Attributes](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/README.md#attribute) + and payload fields like [log bodies](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#field-body). + +### Configuration-driven redaction + +As stated in the motivation the responsibility for applying redaction lays with +the operator of an application, since they are the only ones knowing the specific +requirements for their environment. Because of that this proposal suggests that +OpenTelemetry should provide a solution to end users that allows them to provide +a configuration for their redaction requirements. This configuration consists of +two pieces: + +* Redaction requirement "configuration profiles", that end user use to express + their overall redaction requirements. +* Fine grained configuration that can be used to express requirements that are + unique to their environment. + +**Example**: An end user may configure an application with "stricter" requirements + and some custom requirements like the following: + +* They provide a profile name `STRICTER` via an environment variable, e.g. + OTEL_SENSITIVE_DATA_PROFILE="STRICTER" +* They add custom requirements in a + [configuration file](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/file-configuration.md), e.g. + + ```yaml + sensitiveData: + - attribute: url.query + redaction: REDACT_ALL_URL_VALUES + - attribute: com.example.customer.name + redaction: DROP + - attribute: com.example.customer.email + redaction: 's/^[^@]*@/REDACTED@/' + ``` + +The above is just an example for how such a configuration may look like, different +formats are discussed below. + +#### Redaction requirement profiles + +An end user should be able to pick from a set of redaction requirement profiles +that cover common configurations: + +- `NONE` or `NEVER`: No redaction is applied. This is useful if an end user + chooses to run a collector nearby the application that will take care of + redaction out of process. If this profile is selected the SDK must make sure + that (almost) no overhead remains from the redaction methods. +- `DEFAULT`: A set of rules that especially make sure that no security-sensitive + data is exposed, only limited PII-related redactions are applied. +- `STRICT`: A set that builds on top of `DEFAULT` and also applies commonly required + PII-related redactions (anonymize emails, names, IPs, etc) + `STRICTER`: A set that builds on top of `STRICT` and applies even more redactions, + e.g. drop all query attribute values, drop PII-related data, etc. +- `DROP` or `ALWAYS`: All values with (potentially) sensitivity details will be dropped + +Those profiles will be defined through the semantic conventions (see below). + +#### Fine-grained configuration + +On top of the redaction requirements profiles an end user should have capabilities +to add their own local requirements, either through configuration or through code. + +The configuration is a list of rules that the SDK can parse and apply. An item +in this list will consist of: + +- Conditions when a redaction should be applied +- Instructions for the redaction that will be applied when the conditions are met + +In an example configuration file a list could look like the following: + +```yaml + sensitiveData: + - attribute: url.query + redaction: REDACT_ALL_URL_VALUES + - attribute: com.example.customer.name + redaction: DROP + - attribute: com.example.customer.email + redaction: 's/^[^@]*@/REDACTED@/' ``` -The key `` is one of multiple available pre-defined levels of redaction requirements that an end-user may choose, e.g. +In this example the condition is expressed via `attribute` and the instructions +via `redaction`, where either a constant is selected that runs a predefined helper +method, or a sed-like expression that uses a regular expression to apply redaction. -- `DEFAULT`: What should be the default redaction applied to this value -- `STRICT`: What should be a basic redaction that is applied to this value -- `STRICTER`: What should be an advanced redaction that is applied to this value -- … - -The value `` can be one of the following: - -- A regular expression (or sed expression?) which when applied turns parts of a string into their redacted version,e.g. `s/([0-9]+\.[0-9]+\.[0-9]+\.)[0-9]+/\10/` applied on an IP address will replace the last octet with 0: `1.2.3.4` becomes `1.2.3.0` -- A constant that represents pre-defined redaction methods (see below for Redaction helpers), e.g. `REDACT_INSECURE_URL_PARAM_VALUES` will apply what is required for [#971](https://github.com/open-telemetry/semantic-conventions/pull/971) -- A callback that will call a function on that value and apply advanced redaction - -It should be recommended to either use the expression or the constant and only in rare circumstances a callback function should be applied. - -Note that those API calls are no-op and will be implemented by the SDK (as we do it with other API methods as well), this way (almost?) no additional overhead will be created by introducing those annotations. - -Below you will find details on how the `` are included in the definitions of attributes in the Semantic Conventions. By pre-loading the details for known attributes in the SDK configuration a call to `span.setAttribute("url.query", url.toString());` can apply the redaction internally. An end user may append/overwrite those details. +Another possibility is to re-use a language like +[OTTL](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/ottl/README.md). ### Semantic Conventions -With the definitions above values in the semantic conventions can be annotated with `` as outlined above, with the exception that no callback functions can be supplied as redaction rules. +To enable the redaction requirement profiles the semantic conventions need to be +annotated with sensitivity details. Example: @@ -93,80 +160,101 @@ Example: The `DROP` keyword means that the value is replaced with `REDACTED` (or set to null, …). -It is the responsibility of the OpenTelemetry SDK to implement those sensitivity details provided by the semantic conventions. This means that an instrumentation library does not need to add `` when calling an `setAttribute` method or does not need to call the additional `redactAttribute` method as outlined above. An instrumentation library may choose to apply additional redactions (leveraging the OpenTelemetry APIs or doing it before calling `setAttribute` in their own business logic). +It is the responsibility of the OpenTelemetry SDK to implement those sensitivity details provided by the semantic conventions. -### SDK Configuration of sensitivity requirements +### Annotate attributes with sensitivity information + +As an additional building block the OpenTelemetry API should provide capabilities +to library (and application) authors to apply in-place redaction -The annotations and APIs as outlined above will allow SDK users to provide their sensitive requirements as configuration (here: environment variable, but can be encoded in future config files as well), e.g. +One option is to add additional attributes to methods like `setAttribute`: ``` -env OTEL_SENSITIVE_DATA_PROFILE=”STRICT” ./myInstrumentedApp +span.setAttribute("url.query", url.toString(), ); ``` -This call will make sure that redactions applied follow the `STRICT` profile. If not set the `DEFAULT` will be used. Additionally there are 2 levels that can not be used in sensitivity details: - -- `NEVER`: No redaction is applied, SDKs may choose to log a warning that this is a risky choice -- `ALWAYS`: All values with sensitivity details will be dropped - -Additionally SDK end users can provide advanced configuration (through code, configuration file, probably not environment variable) to add specific needs: +Or, if the method can not be extended, an additional method can be called: ``` -{ - => -} +span.setAttribute("url.query", OpenTelemetry.redact(url.toString(), ); ``` -e.g. +Note that those API calls are no-op and will be implemented by the SDK (as we do it with other API methods as well), this way (almost?) no additional overhead will be created by introducing those annotations. + +The `` should look like the following: ``` { - "url.query" => { DEFAULT: REDACT_ALL_URL_VALUES }, - "com.example.customer.name" => { DEFAULT: x => { /* do something in this callback * } }, - "com.example.customer.email" => { ‘s/^[^@]*@/REDACTED@/’ }, + `` => ``, + ... } ``` -### Redactor +There may be multiple redaction profiles (like `DEFAULT`, `STRICT`, etc.), and +for each one a configuration (as outlined above) may be applied. -To accomplish redaction the SDK needs a component (similar to a sampler) that inspects attributes when they are set and applies the required redactions: +any sensitive data. -- `Redactor::setup(profile)` will setup the redactor with the given profile, maybe a constructor? depends on the language -- `Redactor::redact(value, )` will return the redacted value using the provided profile +## Internal details -The redactor will also have all the methods to apply predefined redactions (`REDACT_ALL_URL_VALUES`, `REDACT_INSECURE_URL_PARAM_VALUES`, `DROP`, etc.). If a method is not implemented (either by the SDK or by the end-user choosing one that does not exist), it will default to apply `DROP` to avoid leakage of any sensitive data. +### Redactor -## Internal details +To accomplish redaction the SDK needs a component (similar to a sampler) that inspects attributes when they are set and applies the required redactions: + +- `Redactor::setup(profile)` will setup the redactor with the given profile, + maybe a constructor? depends on the language +- `Redactor::redact(value, )` will return the redacted value + using the provided profile -**tbd** +The redactor will also have all the methods to apply predefined redactions +(`REDACT_ALL_URL_VALUES`, `REDACT_INSECURE_URL_PARAM_VALUES`, `DROP`, etc.). +If a method is not implemented (either by the SDK or by the end-user choosing +one that does not exist), it will default to apply `DROP` to avoid leakage of ## Additional context -Treating sensitive data properly is a very complex multi-dimensional topic. Below you will find some additional context on that subject. +Treating sensitive data properly is a very complex multi-dimensional topic. +Below you will find some additional context on that subject. ### Types of sensitive data -There are different kinds of “sensitivity” that may apply to data. The ones most relevant in this proposal are “security” and “privacy”. They may overlap but we distinguish them as follows: +There are different kinds of “sensitivity” that may apply to data. The ones most +relevant in this proposal are “security” and “privacy”. They may overlap but we +distinguish them as follows: -- security-relevant sensitive data: any information that when exposed [weakens the overall security of the device/system](https://en.wikipedia.org/wiki/Vulnerability_(computing)). -- privacy-relevant sensitive data: any information that when exposed [can adversely affect the privacy or welfare of an individual](https://en.wikipedia.org/wiki/Information_sensitivity). +- security-relevant sensitive data: any information that when exposed + [weakens the overall security of the device/system](https://en.wikipedia.org/wiki/Vulnerability_(computing)). +- privacy-relevant sensitive data: any information that when exposed + [can adversely affect the privacy or welfare of an individual](https://en.wikipedia.org/wiki/Information_sensitivity). -Note, that there are other kinds of sensitive data (business information, classified information), which are not covered extensively by this proposal. +Note, that there are other kinds of sensitive data (business information, +classified information), which are not covered extensively by this proposal. ### Level of Sensitivity -The level of sensitivity of an information can also be different and that sensitivity can be contextual, e.g. +The level of sensitivity of an information can also be different and that +sensitivity can be contextual, e.g. -- The password of a user without privileges is less sensitive than the password of an administrator -- The "client IP" in a server-to-server communication is less sensitive than the "client IP" in an client-to-server communication, where the client can be linked to a human. -- API tokens of a demo system are less sensitive than API tokens for a production system -- The license plate of an individual’s car is less sensitive than their social security number -- The full name of a user in a social network is less sensitive than the full name of a user in a medical research database +- The password of a user without privileges is less sensitive than the password + of an administrator +- The "client IP" in a server-to-server communication is less sensitive than the + "client IP" in an client-to-server communication, where the client can be + linked to a human. +- API tokens of a demo system are less sensitive than API tokens for a production + system +- The license plate of an individual’s car is less sensitive than their social + security number +- The full name of a user in a social network is less sensitive than the full + name of a user in a medical research database -Depending on the sensitivity data an end-user of an observability system may weigh up if collecting this data is worth it. +Depending on the sensitivity data an end-user of an observability system may +weigh up if collecting this data is worth it. ### Regulatory and other requirements -Due to the negative effects that the exposure of sensitive data can have (see above in "Types of sensitive data"), different entities have created regulations for the collection of sensitive data, among them: +Due to the negative effects that the exposure of sensitive data can have (see +above in "Types of sensitive data"), different entities have created regulations +for the collection of sensitive data, among them: - GDPR - CPRA @@ -174,19 +262,29 @@ Due to the negative effects that the exposure of sensitive data can have (see ab - HIPAA - [more…](https://en.wikipedia.org/wiki/Information_privacy) -Additionally the entities operating the applications who leverage OpenTelemetry may have their own requirements for treating certain sensitive data. +Additionally the entities operating the applications who leverage OpenTelemetry +may have their own requirements for treating certain sensitive data. -Finally end-users may want to apply recommendations for [Data Minimization](https://en.wikipedia.org/wiki/Data_minimization), to avoid "unnecessary risks for the data subject". +Finally end-users may want to apply recommendations for +[Data Minimization](https://en.wikipedia.org/wiki/Data_minimization), to avoid +"unnecessary risks for the data subject". -**Note 1**: it is not (and can not be) the responsibility of the OpenTelemetry project to provide compliance with any of those regulations, this is a responsibility of the OTel end-user. OTel can only facilitate parts of those requirements. +**Note 1**: it is not (and can not be) the responsibility of the OpenTelemetry +project to provide compliance with any of those regulations, this is a +responsibility of the OTel end-user. OTel can only facilitate parts of those +requirements. -**Note 2**: Those requirements are subject of change and outside of the control of the OpenTelemetry community. +**Note 2**: Those requirements are subject of change and outside of the control +of the OpenTelemetry community. ## Trade-offs and mitigations ### Performance Impact -By adding an extra layer of processing every time an attribute value gets set, might have an impact on the performance. There might be ways to reduce that overhead, e.g. by only redacting values which are finalized and ready to exported such that updated values or sampled data does not need to be handled. +By adding an extra layer of processing every time an attribute value gets set, +might have an impact on the performance. There might be ways to reduce that +overhead, e.g. by only redacting values which are finalized and ready to exported +such that updated values or sampled data does not need to be handled. ## Prior art and alternatives @@ -197,7 +295,8 @@ By adding an extra layer of processing every time an attribute value gets set, m ### Spec & SemConv Issues -This problem has been discussed multiple times, here is a list of existing issues in the OpenTelemetry repositories: +This problem has been discussed multiple times, here is a list of existing issues +in the OpenTelemetry repositories: - [Http client and server span default collection behavior for url.full and url.query attributes](https://github.com/open-telemetry/semantic-conventions/issues/860) - [URL query string values should be redacted by default](https://github.com/open-telemetry/semantic-conventions/pull/961) @@ -212,7 +311,8 @@ This problem has been discussed multiple times, here is a list of existing issue ### SemConv Pages -The semantic conventions already contains notes around treating sensitive data (search for "sensitive" on the linked pages if not stated otherwise): +The semantic conventions already contains notes around treating sensitive data +(search for "sensitive" on the linked pages if not stated otherwise): - [gRPC SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/rpc/grpc.md) - [Sensitive Information in URL SemConv](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/url/url.md#sensitive-information) @@ -239,7 +339,8 @@ The following solutions for OpenTelemetry already exist: ### Alternative 0: Do nothing -It’s always good to analyze this option. If we do nothing end users will still need to satisfy their requirements for treating sensitive data accordingly: +It’s always good to analyze this option. If we do nothing end users will still +need to satisfy their requirements for treating sensitive data accordingly: - Instrumentation library authors are required to manage redaction before using the OpenTelemetry API - Application developers will do the same or are forced to join the instrumented application with a collector for redaction/filtering @@ -247,17 +348,24 @@ It’s always good to analyze this option. If we do nothing end users will still ### Alternative 1: OpenTelemetry Collector -As listed above there are multiple processors available that can be used to redact or filter sensitive data with the OpenTelemetry collector. The challenge with that is that it is unknown to an application (owner) if data is processed in the collector as expected. Also, the data leaving the application might already be a risk (non-encrypted or compromised network) or may not be allowed (collector is hosted in a different country, which may conflict with a regulation) +As listed above there are multiple processors available that can be used to +redact or filter sensitive data with the OpenTelemetry collector. The challenge +with that is that it is unknown to an application (owner) if data is processed +in the collector as expected. Also, the data leaving the application might +already be a risk (non-encrypted or compromised network) or may not be allowed +(collector is hosted in a different country, which may conflict with a regulation) Ideally a combination is used. ### Alternative 2: Backend -The backend consuming the OpenTelemetry data can provide processing for filtering and redaction as well. The same objection as for the collector apply. +The backend consuming the OpenTelemetry data can provide processing for filtering +and redaction as well. The same objection as for the collector apply. ### Existing Solutions outside OpenTelemetry -There are many solutions outside OpenTelemetry that help to filter or redact sensitive data based on security and privacy requirements: +There are many solutions outside OpenTelemetry that help to filter or redact +sensitive data based on security and privacy requirements: - [sanitize_field_name in Elastic Java](https://www.elastic.co/guide/en/apm/agent/java/1.x/config-core.html#config-sanitize-field-names) - [Filter sensitive data in AppDynamics Java Agent](https://docs.appdynamics.com/appd/24.x/24.4/en/application-monitoring/install-app-server-agents/java-agent/administer-the-java-agent/filter-sensitive-data) @@ -268,10 +376,22 @@ There are many solutions outside OpenTelemetry that help to filter or redact sen ## Open questions -- **Question 1**: Should sensitivity details for an attribute in the semantic conventions be excluded from the stability guarantees? This means, updating them for a **stable** attribute is not a breaking change. The idea behind excluding them from the stability guarantees is that the requirements are subject of change due to changes in technology (see [#971](https://github.com/open-telemetry/semantic-conventions/pull/971), the list of query string values will evolve over time) or changes in regulatory requirements or both. +- **Question 1**: Should sensitivity details for an attribute in the semantic + conventions be excluded from the stability guarantees? This means, updating + them for a **stable** attribute is not a breaking change. The idea behind + excluding them from the stability guarantees is that the requirements are + subject of change due to changes in technology (see [#971](https://github.com/open-telemetry/semantic-conventions/pull/971), the list of query string values will evolve + over time) or changes in regulatory requirements or both. ## Future possibilities -Attributes are most likely to carry sensitive information, but as stated in the overview section of the explanation other user-set properties may carry sensitive information as well. In a later iteration we might want to review them as well. +Attributes and payload data are most likely to carry sensitive information, but +as stated in the overview section of the explanation other user-set properties may +carry sensitive information as well. In a later iteration we might want to review +them as well. -The proposal puts the configuration of sensitivity requirements into the hands of the person operating an application. In a future iteration we can look into providing end-users of instrumented applications to provide their _consent_ of which and how data related to them is tracked, see [Do Not Track](https://en.wikipedia.org/wiki/Do_Not_Track), [Global Privacy Control](https://privacycg.github.io/gpc-spec/) and the requirements of certain local regulations. +The proposal puts the configuration of sensitivity requirements into the hands of +the person operating an application. In a future iteration we can look into +providing end-users of instrumented applications to provide their _consent_ of +which and how data related to them is tracked, see [Do Not Track](https://en.wikipedia.org/wiki/Do_Not_Track), [Global Privacy Control](https://privacycg.github.io/gpc-spec/) and the +requirements of certain local regulations. From 2d508f8fab5dff63762714edb7f5e571bb95c9de Mon Sep 17 00:00:00 2001 From: svrnm Date: Wed, 8 May 2024 14:06:58 +0200 Subject: [PATCH 7/8] fixes Signed-off-by: svrnm --- text/0255-sensitive-data-redaction.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/text/0255-sensitive-data-redaction.md b/text/0255-sensitive-data-redaction.md index 484ff0438..bc4a807ce 100644 --- a/text/0255-sensitive-data-redaction.md +++ b/text/0255-sensitive-data-redaction.md @@ -70,7 +70,7 @@ two pieces: * Redaction requirement "configuration profiles", that end user use to express their overall redaction requirements. -* Fine grained configuration that can be used to express requirements that are +* Fine grained configuration that can be used to express requirements that are unique to their environment. **Example**: An end user may configure an application with "stricter" requirements @@ -107,7 +107,7 @@ that cover common configurations: data is exposed, only limited PII-related redactions are applied. - `STRICT`: A set that builds on top of `DEFAULT` and also applies commonly required PII-related redactions (anonymize emails, names, IPs, etc) - `STRICTER`: A set that builds on top of `STRICT` and applies even more redactions, + `STRICTER`: A set that builds on top of `STRICT` and applies even more redactions, e.g. drop all query attribute values, drop PII-related data, etc. - `DROP` or `ALWAYS`: All values with (potentially) sensitivity details will be dropped @@ -339,7 +339,7 @@ The following solutions for OpenTelemetry already exist: ### Alternative 0: Do nothing -It’s always good to analyze this option. If we do nothing end users will still +It’s always good to analyze this option. If we do nothing end users will still need to satisfy their requirements for treating sensitive data accordingly: - Instrumentation library authors are required to manage redaction before using the OpenTelemetry API From dd32fef8a49e2f324f73971fbba349d38c2f7bf5 Mon Sep 17 00:00:00 2001 From: svrnm Date: Wed, 8 May 2024 14:10:29 +0200 Subject: [PATCH 8/8] write out what IIN stands for Signed-off-by: svrnm --- text/0255-sensitive-data-redaction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0255-sensitive-data-redaction.md b/text/0255-sensitive-data-redaction.md index bc4a807ce..4d606cdf1 100644 --- a/text/0255-sensitive-data-redaction.md +++ b/text/0255-sensitive-data-redaction.md @@ -156,7 +156,7 @@ Example: | `client.address`| | Rationale: some reasons why dropping octets may be required
Type: `maybe_pii`
`DEFAULT`: `NONE`
`STRICT`: `'s/([0-9]+\.[0-9]+\.[0-9]+\.)[0-9]+/\10/'`
`STRICTER`: `'s/([0-9]+\.[0-9]+\.)[0-9]+\.[0-9]+/\10.0/'` | | `enduser.creditCardNumber`**[1]** | | Rationale: ...
Type: `always_pii`
DEFAULT: `EXTRACT_IIN`
`STRICT`: `DROP`| -**[1]**: _This is a made-up example for demonstration purpose, it’s not part of the current semantic conventions. It gives a more nuanced example, e.g. that extracting the IIN might be an option over dropping the number completely. It also demonstrated the value of “type”, which can enable Data lineage use cases_ +**[1]**: _This is a made-up example for demonstration purpose, it’s not part of the current semantic conventions. It gives a more nuanced example, e.g. that extracting the Issuer Identification Number (IIN) might be an option over dropping the number completely. It also demonstrated the value of “type”, which can enable Data lineage use cases_ The `DROP` keyword means that the value is replaced with `REDACTED` (or set to null, …).