ocr-gt-tools

A web interface for creating ground truth for evaluating and training OCR.

Summary

ocr-gt-tools allows editing hOCR files, such as those produced by the tesseract or ocropy OCR frameworks.

Screenshot

Features

Editing transcriptions of lines
Commenting on line and page level
Use standardized comment tags to mark common problems
Cheatsheet
Zoom in / Zoom out
Filter visible elements
Select multiple lines and apply tags.

Installation

See INSTALL.md.

About the code

The server-side code is written in Perl.

The frontend is written in HTML and Javascript.

Usage

Open 'ocr-gt-tools/index.html' with a browser
open in a second Window 'Page Previews' from Kitodo
Search the book from which you created the hOCR file
Drag and drop a image from the Kitodo 'Page Preview' Window to the Window with 'ocr-gt-tools/index.html'
The perl script ocr-gt-tools.cgi will create in the background all files, which takes a few seconds
with ajax a json objects will be returned to index.html
index.html will load with ajax the created 'correction.html' and 'anmerkungen.txt' inline
'Speichern' will get active if you have written a comment or a text line

Contributing

Expand the wiki

We are using the wiki to collect transcription hints for unusual glyphs and frequent errors.

Pull Requests

Bug fixes, new functions, suggestions for new features and other user feedback are appreciated.

The source code is available from https://github.com/UB-Mannheim/ocr-gt-tools. Please prepare your code contributions also on Github.

Bug reports

Please feel free to open issues for any bug you encounter and features you'd like to have.

Acknowledgments

This is free software. You may use it under the terms of the GNU AFFERO General Public License (AGPL) version 3 or newer. See LICENSE for details.

This project bundles other free software:

EB Garamond Font (SIL Open Font License)
Font Awesome by Dave Gandy (SIL OFL 1.1, MIT)
bootstrap (MIT)
clipboard.js (MIT)
handlebars.js (MIT)
hocr-extract-images (Apache)
jQuery (MIT)
ocropus-gtedit (Apache)
reset-css (Public Domain)

Name		Name	Last commit message	Last commit date
Latest commit History 388 Commits
dev		dev
dist		dist
doc		doc
js		js
.dockerignore		.dockerignore
.gitignore		.gitignore
.jscsrc		.jscsrc
API.md		API.md
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
app.psgi		app.psgi
ocr-gt-tools.cgi		ocr-gt-tools.cgi
ocr-gt-tools.js		ocr-gt-tools.js
ocr-gt-tools.pug		ocr-gt-tools.pug
ocr-gt-tools.styl		ocr-gt-tools.styl
package.json		package.json
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocr-gt-tools

Table of Contents

Summary

Screenshot

Features

Installation

About the code

Usage

Contributing

Expand the wiki

Pull Requests

Bug reports

Acknowledgments

About

Releases

Packages

Contributors 5

Languages

License

UB-Mannheim/ocr-gt-tools

Folders and files

Latest commit

History

Repository files navigation

ocr-gt-tools

Table of Contents

Summary

Screenshot

Features

Installation

About the code

Usage

Contributing

Expand the wiki

Pull Requests

Bug reports

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages