A script to replace user information in the CKAN database with fake data. The script will:
- Connect to the CKAN database in a local PostgreSQL.
- Select all entries from the
user
table. - Replace the each entry's
fullname
with a name generated by the faker tool. - Repleace each entry's
name
andemail
with a values generated from the newfullname
. - Dump the modified and anomymised db to a file using
pg_dump
. - Roll back the db to its initial state.
% python anonymise_db.py --help
usage: anonymise_db.py [-h] [--db DB] [--user USER] [--host HOST] [--port PORT] --pw PW
Connect to a CKAN DB in Postgres and replace `fullname`, `name` and `email` in the `user` table with random names.
Also delete `about` and `image_url` (set to '').
options:
-h, --help show this help message and exit
--db DB Name of the CKAN database in Postgres. Default: ckan
--user USER Username to access the database. Default: ckan
--host HOST Postgres database host. Default: localhost.
--port PORT Postgres database port. Default: 5432.
required named arguments:
--pw PW Password to access the database
- The script has been tested with Python 3.12.
- It uses
faker
,python-slugify
andpsycopg2
(see requirements.txt)
- Clone this repository
git clone https://github.com/berlinonline/anonymise_ckan_db
- Optionally create and activate a virtual environment
cd anonymise_ckan_db
python -m venv venv
. venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Before running, you'll need a file
exclude.json
with a JSON list of usernames to exclude from the anonymisation (function accounts etc.).
All code in this repository is published under the MIT License.