Skip to content

ContentMine/norma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

2970a4d · Dec 13, 2017
May 1, 2016
May 16, 2015
Mar 15, 2017
Dec 13, 2017
Jun 13, 2016
Mar 10, 2015
Apr 7, 2016
Feb 1, 2015
May 16, 2016
May 28, 2015
Mar 4, 2016
Apr 16, 2015
May 17, 2016
Mar 12, 2015
Dec 21, 2015
Mar 13, 2017
Dec 13, 2017

Repository files navigation

Norma

A tool to convert a variety of inputs into normalized, tagged, XHTML (with embedded/linked SVG and PNG where appropriate). The initial emphasis is on scholarly publications but much of the technology is general.

Installation

For a simple introduction and a description of how to install binaries of the software please see: here

Input

Norma will convert legacy files into scholarly html. It converts files that are in a CProject structure. This enables it to process multiple papers in a single run without overwriting files. It also keeps all the data from each paper together in its own CTree. This includes metadata about the paper, images that may have been extracted from the paper and supplementary files such as tables.

To convert a CTree full of NLM xml files such as those you might have downloaded from EuropePMC with getpapers you can run:

norma --project <CProject folder> --input fulltext.xml --output scholarly.html --transform nlm2html

Building from source

Norma can be built with maven3 and requires java 1.7 or greater.

Contributing to development

If you're interested in contributing please take a look at: CONTRIBUTING.md