fix(eval): fix embedding averaging in privacy metrics#81
fix(eval): fix embedding averaging in privacy metrics#81
Conversation
5a28c52 to
b8c5c29
Compare
There was a problem hiding this comment.
Pull request overview
Fixes incorrect column-embedding averaging in the AIP (attribute inference) and MIP (membership inference) privacy metrics by switching from sequential pairwise averaging (which overweighted later columns) to an equal-weight mean across all text columns, and factors shared logic into a new utility module.
Changes:
- Introduce
privacy_metric_utils.pywith shared helpers for identifying text fields, splitting tabular/text columns, and embedding+averaging text columns vianp.stack/np.mean. - Update AIP/MIP components to use the shared helpers and to reuse a single
SentenceTransformerinstance per metric invocation. - Add unit tests covering the corrected equal-weight embedding averaging behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/evaluation/components/test_privacy_metric_utils.py | Adds regression/unit tests validating equal-weight embedding averaging and dataframe splitting. |
| src/nemo_safe_synthesizer/evaluation/components/privacy_metric_utils.py | New shared utilities for text-field detection and embedding aggregation. |
| src/nemo_safe_synthesizer/evaluation/components/membership_inference_protection.py | Replaces local helpers with shared utilities; fixes embedding averaging logic in MIP path. |
| src/nemo_safe_synthesizer/evaluation/components/attribute_inference_protection.py | Replaces local helpers with shared utilities; fixes embedding averaging logic in AIP path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/nemo_safe_synthesizer/evaluation/components/privacy_metric_utils.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Nina Xu <ninaxu@cs-oci-ord-vscode-01.cm.cluster>
Signed-off-by: Nina Xu <ninaxu@cs-oci-ord-vscode-01.cm.cluster>
Signed-off-by: Nina Xu <19981858+nina-xu@users.noreply.github.com>
Signed-off-by: Nina Xu <19981858+nina-xu@users.noreply.github.com>
c40b5f2 to
d6ed8be
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: nina-xu <19981858+nina-xu@users.noreply.github.com>
Summary
Per this GitLab issue, we are averaging column embeddings incorrectly in the two privacy metrics, AIP and MIP. Later columns get a higher weight in the averaging. Taking Aaron's suggestion in the ticket for an equal weighting.
In addition, I DRY'ed up the code, taking common methods out into privacy_metric_utils.py, because both metrics utilize the same way of averaging text column embeddings.
I also added some tests to illustrate that the new way of averaging gives each column equal weight.
Pre-Review Checklist
Ensure that the following pass:
make format && make lintor via prek validation.make testpasses locallymake test-e2epasses locallymake test-ci-containerpasses locally (recommended)Pre-Merge Checklist
Testing Plan
Take one dataset, shuffle around columns and calculate privacy scores to show the effect before & afterRun slurm experiments to compare the scores before & after, so that we are aware of the new baselinedollyhas 10 MIA and 10 for AIA for the 3 text columns, so we wouldn't be able to observe the difference from this dataset;patient_eventsalso has 3 text columns, but it also has an AIA of 10Note
Closes #203