Skip to content

Latest commit

 

History

History
77 lines (56 loc) · 1.94 KB

database-management.md

File metadata and controls

77 lines (56 loc) · 1.94 KB

Advanced database commands

Here we describe some additional useful commands to handle IYP dumps.

Update existing database

If you setup the database you can load a new dump without recreating the Docker containers. Place the new dump at dumps/neo4j.dump, delete the existing database and run only the loader again:

# If the database is running, stop it.
# docker stop iyp
# Delete the existing database
rm -r data/*
# Run the loader
docker start -i iyp_loader
# Start the database.
docker start iyp

Save modified database

If you did changes to the database and want to dump the contents into a file, you can use the loader for this. For example, to dump the database into a folder called backups:

# Directory has to exist or it will be created as root by Docker.
mkdir -p backups
UID="$(id -u)" GID="$(id -g)" docker compose run --rm -i -v "$PWD/backups:/backups" iyp_loader neo4j-admin database dump neo4j --to-path=/backups --verbose --overwrite-destination

This will create a file called neo4j.dump in the backups folder. Note that this will also overwrite this file if it exists!

View Neo4j logs

To view the logs of the Neo4j container, use the following command:

docker logs -f iyp

Create a new dump from scratch

Enabling all crawlers will download a lot of data and take multiple days to create a dump.

Clone this repository:

git clone https://github.com/InternetHealthReport/internet-yellow-pages.git
cd internet-yellow-pages

Create Python environment and install Python libraries:

python3 -m venv --upgrade-deps .venv
source .venv/bin/activate
pip install -r requirements.txt

Create a configuration file from the example file and add API keys. Note that some crawlers do not work without credentials.

cp config.json.example config.json
# Edit as needed

Create and populate a new database:

python3 create_db.py