Skip to content

feat: add configuration option to normalize URLs on HTTP events (Fetch and XHR plugins) #650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sergioflores-j
Copy link

Feature proposal

This PR adds a new configuration option to the HTTP telemetry.

eventURLNormalizer?: (url: string) => string;

PS: Nomenclature is up to debate

This function will be invoked before recording the HTTP event in RUM. And is useful for scenarios like the following:

  1. HTTP requests done for a single endpoint that contains Path Parameters:
/users/1234
/users/5678

Can be normalized to, e.g.:
/users/{userId}

This helps avoiding noise in the RUM monitor by aggregating data (request count, sessions, etc) about same endpoints in a single "row".

  1. HTTP requests that contain possibly sensitive information in the URL
/users/<some-sensitive-info-here>
  1. For normalizing known services, to help finding information in the code faster
/users/1234

Can be normalized to, e.g.:
"My Service Name - GetUserById"

Considerations

  • I have been running this in a patched version in my project and did not experience any regression issues.

  • I have considered not exposing this configuration as a function!
    Instead it would be receiving URL patterns and what to replace it with. (a similar configuration to urlsToInclude) to avoid shifting complexity to consumers.
    But that would be more complexity to be maintained here, and could not be predicting all scenarios. So I followed the idea of the ignore function (from JSError plugin)

  • I have NOT changed all files (documentation, integration tests, etc) because I want to collect feedback for the proposal first.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@williazz
Copy link
Contributor

This is a very good PR, and I've seen this pattern used a lot. It also addresses #545

If our purpose is to hide PID, then could keep in extensibility in mind? It might make sense to consolidate the pid scrubbing/transforming logic

@sergioflores-j
Copy link
Author

sergioflores-j commented Apr 22, 2025

This is a very good PR, and I've seen this pattern used a lot. It also addresses #545

If our purpose is to hide PID, then could keep in extensibility in mind? It might make sense to consolidate the pid scrubbing/transforming logic

@williazz I haven't considered scrubbing the whole HTTP event as my use case was mostly URL normalization (and not really PII can be sent via the URL in my case), but that sounds like a nice idea to have!

Do you have any desired interface in mind?
I was quickly playing with some ideas:

1. Exposing a callback function (similar to what this PR is already doing)

// interface
eventScrubber: (httpEvent: HttpEvent): => HttpEvent

// in the code:
const defaultEventScrubber = (httpEvent) => {
  // some things could already be done by default (e.g. https://docs.sentry.io/security-legal-pii/scrubbing/server-side-scrubbing/event-pii-fields/)
  return httpEvent;
};

// ...
constructor(config) {
  this.config.eventScrubber = config.eventScrubber ?? defaultEventScrubber
}

// ...
this.context.record(HTTP_EVENT_TYPE, this.config.eventScrubber(httpEvent));

Pros

  • Simple to maintain, as it is only a function being called before recording the event.
  • ?

Cons

  • Must ensure the event being recorded after scrubbing fulfils the basic requirements in RUM (I'm guessing URL/method?). To avoid surprises in runtime 😄
  • ?

2. Exposing a configuration (similar to what Sentry docs propose but more TS style)

type DataScrub = {
  match: RegExp;
  replace: string; // what other types? number, boolean??
  attributePath: keyof typeof HttpEvent; // would need to be a proper type for a nested path e.g. request.headers.Authorization
}

// interface
eventScrubber: DataScrub[]

Implementation is then similar, but done here and not in the consumer's side.

Pros

  • Standard way of scrubbing data (could even be a function to be reused in the JSError, etc)
  • ?

Cons

  • More code to maintain here, that could not be anticipating some "user-specific" scenarios
  • ?

@sergioflores-j
Copy link
Author

I created another branch to start proposing something reusable by all plugins (very WIP to exercise the idea of the interface, basic implementation, etc):
Branch
Comparison

PS: Notable changes are in src/plugins/utils/http-utils.ts to introduce a "dataScrubbing" configuration 😄


Though I must say that while working on it, "Request URL normalization" was the only real use case I could really think of for HTTP Events.
As there is very little data being recorded in them (i.e. no request/response headers & body, etc) scrubbing to prevent PII info being sent is often unnecessary.
Maybe other plugins (like the error plugin from the linked issue) could benefit more from scrubbing mechanisms?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants