fix(eval): fix embedding averaging in privacy metrics by nina-xu · Pull Request #81 · NVIDIA-NeMo/Safe-Synthesizer

nina-xu · 2026-02-20T18:51:30Z

Summary

Per this GitLab issue, we are averaging column embeddings incorrectly in the two privacy metrics, AIP and MIP. Later columns get a higher weight in the averaging. Taking Aaron's suggestion in the ticket for an equal weighting.

In addition, I DRY'ed up the code, taking common methods out into privacy_metric_utils.py, because both metrics utilize the same way of averaging text column embeddings.

I also added some tests to illustrate that the new way of averaging gives each column equal weight.

Pre-Review Checklist

Ensure that the following pass:

make format && make lint or via prek validation.
make test passes locally
make test-e2e passes locally
make test-ci-container passes locally (recommended)

Pre-Merge Checklist

New or updated tests for any fix or new behavior
Updated documentation for new features and behaviors, including docstrings for API docs.

Testing Plan

Add unit tests
~~Take one dataset, shuffle around columns and calculate privacy scores to show the effect before & after~~
~~Run slurm experiments to compare the scores before & after, so that we are aware of the new baseline~~
We hardly have datasets with 3 or more text columns, so it's hard to demonstrate the change in action. Ran the "short" dataset group; plus dolly (3 text columns) to demonstrate that things are not breaking (wandb). More specifically, dolly has 10 MIA and 10 for AIA for the 3 text columns, so we wouldn't be able to observe the difference from this dataset; patient_events also has 3 text columns, but it also has an AIA of 10

Note

Closes #203

Copilot

Pull request overview

Fixes incorrect column-embedding averaging in the AIP (attribute inference) and MIP (membership inference) privacy metrics by switching from sequential pairwise averaging (which overweighted later columns) to an equal-weight mean across all text columns, and factors shared logic into a new utility module.

Changes:

Introduce privacy_metric_utils.py with shared helpers for identifying text fields, splitting tabular/text columns, and embedding+averaging text columns via np.stack/np.mean.
Update AIP/MIP components to use the shared helpers and to reuse a single SentenceTransformer instance per metric invocation.
Add unit tests covering the corrected equal-weight embedding averaging behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
tests/evaluation/components/test_privacy_metric_utils.py	Adds regression/unit tests validating equal-weight embedding averaging and dataframe splitting.
src/nemo_safe_synthesizer/evaluation/components/privacy_metric_utils.py	New shared utilities for text-field detection and embedding aggregation.
src/nemo_safe_synthesizer/evaluation/components/membership_inference_protection.py	Replaces local helpers with shared utilities; fixes embedding averaging logic in MIP path.
src/nemo_safe_synthesizer/evaluation/components/attribute_inference_protection.py	Replaces local helpers with shared utilities; fixes embedding averaging logic in AIP path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/nemo_safe_synthesizer/evaluation/components/privacy_metric_utils.py

tests/evaluation/components/test_privacy_metric_utils.py

Signed-off-by: Nina Xu <ninaxu@cs-oci-ord-vscode-01.cm.cluster>

Signed-off-by: Nina Xu <19981858+nina-xu@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/nemo_safe_synthesizer/evaluation/components/privacy_metric_utils.py

Signed-off-by: nina-xu <19981858+nina-xu@users.noreply.github.com>

kendrickb-nvidia mentioned this pull request Mar 11, 2026

NSS privacy scores do not compute average correctly when 3 or more text columns #203

Open

nina-xu force-pushed the ninaxu/fix-privacy-metrics branch from 5a28c52 to b8c5c29 Compare March 31, 2026 14:56

nina-xu marked this pull request as ready for review March 31, 2026 20:58

nina-xu requested a review from a team as a code owner March 31, 2026 20:58

Copilot AI review requested due to automatic review settings March 31, 2026 20:58

Copilot started reviewing on behalf of nina-xu March 31, 2026 20:59 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

src/nemo_safe_synthesizer/evaluation/components/privacy_metric_utils.py Outdated Show resolved Hide resolved

tests/evaluation/components/test_privacy_metric_utils.py Show resolved Hide resolved

Nina Xu and others added 5 commits April 1, 2026 06:04

fix embedding average with unequal weighting, DRY & test

d8b72f1

Signed-off-by: Nina Xu <ninaxu@cs-oci-ord-vscode-01.cm.cluster>

make format && make lint

7a061d4

Signed-off-by: Nina Xu <ninaxu@cs-oci-ord-vscode-01.cm.cluster>

nit type

32ceb11

Signed-off-by: Nina Xu <19981858+nina-xu@users.noreply.github.com>

skip tests when sentence transformers is not available

1f42f5e

Signed-off-by: Nina Xu <19981858+nina-xu@users.noreply.github.com>

guide import of sentence transformers; formating

d6ed8be

Signed-off-by: Nina Xu <19981858+nina-xu@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 1, 2026 13:07

nina-xu force-pushed the ninaxu/fix-privacy-metrics branch from c40b5f2 to d6ed8be Compare April 1, 2026 13:07

Copilot started reviewing on behalf of nina-xu April 1, 2026 13:07 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

nina-xu requested a review from asteier2026 April 1, 2026 13:20

kendrickb-nvidia requested changes Apr 3, 2026

View reviewed changes

src/nemo_safe_synthesizer/evaluation/components/privacy_metric_utils.py Show resolved Hide resolved

reformat docstrings

494cae8

Signed-off-by: nina-xu <19981858+nina-xu@users.noreply.github.com>

nina-xu requested a review from kendrickb-nvidia April 3, 2026 19:09

nina-xu mentioned this pull request Apr 7, 2026

chore: update ty and fix all type issues #141

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval): fix embedding averaging in privacy metrics#81

fix(eval): fix embedding averaging in privacy metrics#81
nina-xu wants to merge 6 commits intomainfrom
ninaxu/fix-privacy-metrics

nina-xu commented Feb 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nina-xu commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pre-Review Checklist

Pre-Merge Checklist

Testing Plan

Note

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nina-xu commented Feb 20, 2026 •

edited

Loading