-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get page information from extracted text #226
Comments
Hey @atahanuz! I'm here to help you with bug fixes, answer questions, and become a contributor. Let's squash those bugs together! 🤖 To determine which part of the extracted text from a PDF belongs to a specific page, such as page 7, you can use the Here's a code snippet to illustrate how you can filter the blocks for page 7: # Assuming `mp_document` is the MPDocument object returned by the convert method
page_number = 7
page_7_blocks = [
block for block in mp_document.content
if block.page_range and block.page_range[0] == page_number
]
# Now `page_7_blocks` contains all blocks from page 7 This code iterates over the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
If the In MegaParse, the The page_range=(metadata.page_number, metadata.page_number)
if metadata.page_number
else None If the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Dosubot weren't be able to help, can someone do? 🥲 |
Hello, I'm using megaparse vision to extract text from a PDF. After the extraction, how can find which part of the extracted text belonged to which page (i.e. page 7 of the PDF) ?
The text was updated successfully, but these errors were encountered: