This Python build tool enables a given user to calculate a variety of different data privacy metrics on tabular data from a user interface.
- K-anonymity 1
- ℓ-diversity 2
- Sample Unique Detection Algorithm (SUDA) 3
- Privacy Information Factor (PIF) 4
- Noise addition
- Field generalisation
- Rounded Approximation
Input can be in either CSV or TSV format. For meta information an option of load of json file is possible.
The metaprivBIDS software runs on multiple platforms (e.g. Linux, MacOS, Windows) that have a Python 3 installation. It is recommended (but not required) to first create a virtual environment.
In the event of permission issues for system dependent files, you might want to set the pkgs_dirs option in Conda's configuration to use a directory that is writable by you.
conda config --add pkgs_dirs ~/conda_pkgs
Creates the enviroment.
conda create --name venv python=3.x.x
Activates the environment.
conda activate venv
Graphviz requires system level dependencies as well as rpy2 and might need to be installed with
conda create --name venv -c conda-forge "python>=3.7" graphviz pygraphviz r-base r-sdcMicro rpy2
if not avaliable.
You can then install metaprivBIDS by furst cloning the git respository.
git clone https://github.com/CPernet/metaprivBIDS.git
cd into the MetaprivBIDS folder
cd MetaprivBIDS
and then run
pip install -e .
python -m venv venv
source venv/bin/activate
You can then install metaprivBIDS by cloning the git respository.
git clone https://github.com/CPernet/metaprivBIDS.git
To execute the program make sure all dependencies from pyproject.toml is avalible in a python 3.7 enviroment as stated in the software installation.
This can be done by first cd
into the MetaprivBIDS directiory and then running
pip install -e .
To execute the program run from command line
metaprivBIDS
prompting the program to start.
After following the installation guide, the metrics within the MetaprivBIDS tool can be called through an import statement without making use of the GUI.
e.g.
from metaprivBIDS.metaprivBIDS.corelogic.metapriv_corelogic import metaprivBIDS_core_logic
metapriv = metaprivBIDS_core_logic()
# Load the data
data_info = metapriv.load_data('metaprivBIDS/Use_Case_Data/adult_mini.csv')
# Inspect {column, unique value count, column type}
data = data_info["data"]
print("Column Types:",'\n')
print(data_info["column_types"],'\n')
# Select Quasi-Identifiers
selected_columns = ["age", "education", "marital-status", "occupation", "relationship","sex","salary-class"]
results_k_global = metapriv.find_lowest_unique_columns(data, selected_columns)
print('Find Influential Columns:','\n')
print(results_k_global)
# Compute Personal Information Factor
pif_value, cig_df = metapriv.compute_cig(data, selected_columns)
print("PIF Value:", pif_value)
print("CIG DataFrame:")
print(cig_df)
# Run SUDA2 computation
results_suda = metapriv.compute_suda2(data, selected_columns, sample_fraction=0.3, missing_value=-999)
# Access results
data_with_scores = results_suda["data_with_scores"]
attribute_contributions = results_suda["attribute_contributions"]
attribute_level_contributions = results_suda["attribute_level_contributions"]
Footnotes
-
Sweeney, L. (2002). k-Anonymity: A Model for Protecting Privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557-570. ↩
-
Machanavajjhala, A., Kifer, D., Gehrke, J., & Venkitasubramaniam, M. (2007). ℓ-Diversity: Privacy Beyond k-Anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 3-es. ↩
-
Elliott, M. J., & Skinner, C. J. (2000). Identifying population uniques using limited information. Proceedings of the Annual Meeting of the American Statistical Association. ↩
-
Information Governance ANZ. (2019). Privacy Impact Assessment eReport. Link ↩