Replies: 4 comments
-
Don't understand what you did wrong. Here is my script and associated output: from pathlib import Path
import pymupdf4llm
md = pymupdf4llm.to_markdown("scansmp.pdf-searchable.pdf")
Path(doc.name.replace(".pdf", ".md")).write_bytes(md.encode()) THE SLEREXE COMPANY LIMITED
SAPORS LANE - BOOLE - DORSET - BH 25 8ER
TELEPHONE BOOLE (945 13) 51617 - TELEX 123456
Our Ref. 350/PJC/EAC 18th January, 1972.
Dr. P.N. Cundall,
Mining Surveys Ltd.,
Holroyd Road,
Reading,
Berks.
Dear Pete,
Permit me to introduce you to the facility of facsimile
transmission.
In facsimile a photocell is caused to perform a raster scan over
the subject copy. The variations of print density on the document
cause the photocell to generate an analogous electrical video signal.
This signal is used to modulate a carrier, which is transmitted to a
remote destination over a radio or cable communications link.
At the remote terminal, demodulation reconstructs the video
signal, which is used to modulate the density of print produced by a
printing device. This device is scanning in a raster scan synchronised
with that at the transmitting terminal. As a result, a facsimile
copy of the subject document is produced.
Probably you have uses for this facility in your organisation.
Yours sincerely,
ThA.
P.J. CROSS
Group Leader - Facsimile Research
----- |
Beta Was this translation helpful? Give feedback.
-
I had this parameter set,
Why does this affect the output? |
Beta Was this translation helpful? Give feedback.
-
Converted this to a discussion item. For OCRed pages it probably does not make a lot of sense to do it this way - as there is exactly one image always - plus the text. |
Beta Was this translation helpful? Give feedback.
-
Got it. The one i was using wasn't very latest version. That's the reason for not getting a output. |
Beta Was this translation helpful? Give feedback.
-
Trying to parse a below scanned document.
Tried to convert scanned document to searchable using tesseract. Still no result.
What is recommended way to parse such documents?
Using latest pymupdf4llm
scansmpl.pdf
scansmpl.pdf-searchable.pdf
Beta Was this translation helpful? Give feedback.
All reactions