Skip to content

Conversation

@tobias-wilfert
Copy link
Member

@tobias-wilfert tobias-wilfert commented Dec 5, 2025

Adds the logic to scrub the attachment body as well as the attachment meta.

Closes INGEST-648

@tobias-wilfert tobias-wilfert self-assigned this Dec 5, 2025
@tobias-wilfert tobias-wilfert marked this pull request as ready for review December 9, 2025 08:58
@tobias-wilfert tobias-wilfert requested a review from a team as a code owner December 9, 2025 08:58
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


let processor = PiiAttachmentsProcessor::new(config.compiled());
let mut payload = body.to_vec();
if processor.scrub_attachment(filename, &mut payload) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should emit the timer(RelayTimers::AttachmentScrubbing) metric here, either directly or by calling processing::utils::attachments::scrub_attachment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will update. Was not a fan of calling fn scrub_attachment(item: &mut crate::envelope::Item, config: &relay_pii::PiiConfig) because than we would need to construct a mock item since we don't have an item here anymore (but maybe doing so is worth it just to have the code all in one place 🤔).

@linear
Copy link

linear bot commented Dec 10, 2025

let filename = meta
.value()
.and_then(|m| m.filename.as_str())
.unwrap_or_default();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Filename extraction after meta scrubbing breaks body rule matching

The filename is extracted from meta after the meta has already been scrubbed (lines 291-296). Since AttachmentV2Meta.filename is marked with pii = "true", filenames containing PII patterns (like emails or IPs) would be modified during meta scrubbing. When the filename is then used for body scrubbing rule matching (e.g., $attachments.'[email protected]'), the rule won't match because the filename has been scrubbed to something like [email].log. This could cause body scrubbing rules to silently fail, leaving PII unscrubbed in the attachment body.

Fix in Cursor Fix in Web

@tobias-wilfert tobias-wilfert added this pull request to the merge queue Dec 10, 2025
Merged via the queue into master with commit 6636c14 Dec 10, 2025
28 checks passed
@tobias-wilfert tobias-wilfert deleted the tobias-wilfert/feat/attachmentv2-scrubbing branch December 10, 2025 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants