Pipeline for OCR-ing scanned handwritten research cards — 20th-century scholar's index cards with excerpts from early 17th-century Ukrainian (Ruthenian) sources. Uses Claude Code with custom /ocr skill for transcription, preserving archaic Church Slavonic orthography.
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtpython pdf_to_jpeg.py scanned_pdfs/Auto-Color0002.pdf output/Auto-Color0002Single card:
claude -p "/ocr @output/Auto-Color0002/001.jpeg"Batch:
python batch_ocr.py output/Auto-Color0002
python batch_ocr.py output/Auto-Color0002 --force # re-transcribe existing
python batch_ocr.py output/Auto-Color0002 --limit 10python build_demo.py output/Auto-Color0002 -s tertiary
python build_demo.py output/Auto-Color0002 -s filename -o demo.html
python build_demo.py output/Auto-Color0002 --include-blankSort options: primary, secondary, tertiary, filename.
Blank/error cards are skipped by default; use --include-blank to keep them.
Output is a self-contained HTML file (images referenced via relative paths).