Skip to content

Inconsistent OCR #350

@equationcrunchor

Description

@equationcrunchor

Example:

  • Searching document text for meme. http://127.0.0.1:8000/archives/doc/3_19_pmm_memo_re_709_1960_04_29_1_19 is first result.
  • Looking at PDF preview online, there is no meme in text, only memo. Highlighting the sentence Status of programming memo and revision of machine shut-down date to late July. and copy pasting elsewhere gives correct text.
  • Check OCR text in data/processed_pdfs folder. It says Status of programming meme, probably due to OCR error.

Seems like PDF preview and search have different opinions on the OCR?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions