Hybrid distance weighting in AIA

### Priority Level

Medium

### Describe the bug

From copilot review on #185, the distance weighting between text and tabular in attribute_inference_protection.py is most likely not what was intended. See code for "Now get the hybrid distance" at https://github.com/NVIDIA-NeMo/Safe-Synthesizer/blob/5ecc0add5e098a40088a24659d67e8048ff4b447/src/nemo_safe_synthesizer/evaluation/components/attribute_inference_protection.py#L362

> The mixed text+tabular branch’s hybrid weighting below uses len(df_train_use.columns) as the denominator, but df_train_use has already been reduced to tabular-only columns earlier in this function. That makes tab_weight effectively 1 and text_weight > 0 (weights don’t sum to 1), skewing the hybrid distance and potentially changing AIA results. Consider computing weights from the original total column count (e.g., len(text_columns) + len(tabular_columns)) or another normalized scheme that remains valid after splitting.

### Steps/Code to reproduce bug

Run AIA on a dataset with 1 or more text columns

### Expected behavior

Assume we have `m_tab` tabular columns and `m_text` text columns and the tabular distance is `d_tab` and text distance is `d_text`. Without any comments to the contrary, the way the code is written suggests the weighting should be `(m_tab/(m_tab+m_text)) * d_tab + (m_text/(m_tab+m_text)) * d_text`. Ie basic weighted average.

But the actual behavior is `d_tab + (m_text/m_tab) * d_text`. Notably, it's not a weighted average of any sort, and the hybrid distance is larger than either input distance when text columns exist. Unclear how much of an issue this is downstream, but almost certainly not the intended behavior.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hybrid distance weighting in AIA #305

Priority Level

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hybrid distance weighting in AIA #305

Description

Priority Level

Describe the bug

Steps/Code to reproduce bug

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions