Skip to content

romerol/romerol-example-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

romerol-example-code

Project Gutenberg rdf files parser.

Requirements

  • Node.js 12.16.3
  • PostgreSQL 9.5.22 listening on port 5432

Setup

  • Create a database called gutenberg_db that will be used by the app.
  • Create a database called gutenberg_db_test that will be used by the tests.
  • Add a file called secrets.js in the project root with the following structure:
module.exports = {
  DEV_DB_USER: "user name for gutenberg_db",
  DEV_DB_PASSWORD: "user password for gutenberg_db",
  TEST_DB_USER: "user name for gutenberg_db_test",
  TEST_DB_PASSWORD: "user password for gutenberg_db_test"
}
  • Get the rdf files:
  • Manually by downloading and extracting the rds files from http://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.zip and placing them a in a folder called rdf-files in the root of the project
  • OR run npm run download to download them. A folder called rdf-files in the root of the project will be created with the files in it.
  • Install dependencies with npm i

Running the app

  • Run npm start to start the process in order to parse the files and save their contents in the Books table. This script will automatically run the migrations to create the table and add the indexes. Two B-Tree indexes were created: one for the title column and one for the publicationDate column. One GIN index for the authors column given it is an array of strings.
  • Once the process has finished then a message like: Finished parsing 62418 files will be displayed.

Testing

  • Run npm test to run the tests. Some tests will read some example rdf files which are located in the test/rdf-files folder and save them in the gutenberg_db_test DB. Once the tests have finished running then the Books table will be truncated.
  • The module nyc was added to provide code coverage capabilities and the results are displayed once the tests finish running. Also, the folder test-results will be created in the project root with a html page to check code coverage.

To do

  • Add tests for the download service.
  • Plug download and processing.

About

rdf files parser

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published