Hi everyone 👋
I’m evaluating Chandra OCR for document OCR on scanned PDFs and images, and I had a question about the structure of the output it can produce.
What I’m trying to achieve
I’m looking for an output format similar to a layout-aware document DOM, for example:
- Page → blocks → children hierarchy
- Explicit
block_type (Page, SectionHeader, Text, Table, TableCell, etc.)
- Bounding boxes / polygons for each block
- HTML serialization per block (paragraphs, tables, headers)
- Stable IDs like
/page/0/Table/4
- Section hierarchy tracking
Example :
{
"id": "/page/0/Table/4",
"block_type": "Table",
"html": "<table>...</table>",
"bbox": [x1, y1, x2, y2],
"polygon": [[...]],
"children": [...]
}
Or is Chandra intended to provide semantic OCR only (Markdown / HTML / raw text) without explicit geometry, requiring a separate layout-detection step?
Just want to confirm the intended scope and best practice here.
Thanks!