Project Gutenberg rdf files parser.
- Node.js 12.16.3
- PostgreSQL 9.5.22 listening on port
5432
- Create a database called
gutenberg_db
that will be used by the app. - Create a database called
gutenberg_db_test
that will be used by the tests. - Add a file called
secrets.js
in the project root with the following structure:
module.exports = {
DEV_DB_USER: "user name for gutenberg_db",
DEV_DB_PASSWORD: "user password for gutenberg_db",
TEST_DB_USER: "user name for gutenberg_db_test",
TEST_DB_PASSWORD: "user password for gutenberg_db_test"
}
- Get the rdf files:
- Manually by downloading and extracting the rds files from
http://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.zip
and placing them a in a folder calledrdf-files
in the root of the project - OR run
npm run download
to download them. A folder calledrdf-files
in the root of the project will be created with the files in it. - Install dependencies with
npm i
- Run
npm start
to start the process in order to parse the files and save their contents in theBooks
table. This script will automatically run the migrations to create the table and add the indexes. Two B-Tree indexes were created: one for the title column and one for the publicationDate column. One GIN index for the authors column given it is an array of strings. - Once the process has finished then a message like:
Finished parsing 62418 files
will be displayed.
- Run
npm test
to run the tests. Some tests will read some example rdf files which are located in thetest/rdf-files
folder and save them in thegutenberg_db_test
DB. Once the tests have finished running then theBooks
table will be truncated. - The module
nyc
was added to provide code coverage capabilities and the results are displayed once the tests finish running. Also, the foldertest-results
will be created in the project root with a html page to check code coverage.
- Add tests for the download service.
- Plug download and processing.