Here we describe some additional useful commands to handle IYP dumps.
If you setup the database you can load a new dump without
recreating the Docker containers. Place the new dump at dumps/neo4j.dump
, delete the
existing database and run only the loader again:
# If the database is running, stop it.
# docker stop iyp
# Delete the existing database
rm -r data/*
# Run the loader
docker start -i iyp_loader
# Start the database.
docker start iyp
If you did changes to the database and want to dump the contents into a file, you can
use the loader for this. For example, to dump the database into a folder called
backups
:
# Directory has to exist or it will be created as root by Docker.
mkdir -p backups
UID="$(id -u)" GID="$(id -g)" docker compose run --rm -i -v "$PWD/backups:/backups" iyp_loader neo4j-admin database dump neo4j --to-path=/backups --verbose --overwrite-destination
This will create a file called neo4j.dump
in the backups
folder. Note that this
will also overwrite this file if it exists!
To view the logs of the Neo4j container, use the following command:
docker logs -f iyp
Enabling all crawlers will download a lot of data and take multiple days to create a dump.
Clone this repository:
git clone https://github.com/InternetHealthReport/internet-yellow-pages.git
cd internet-yellow-pages
Create Python environment and install Python libraries:
python3 -m venv --upgrade-deps .venv
source .venv/bin/activate
pip install -r requirements.txt
Create a configuration file from the example file and add API keys. Note that some crawlers do not work without credentials.
cp config.json.example config.json
# Edit as needed
Create and populate a new database:
python3 create_db.py