Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processor.input_files: handle list of pageId #624

Closed
wants to merge 1 commit into from
Closed

Processor.input_files: handle list of pageId #624

wants to merge 1 commit into from

Conversation

kba
Copy link
Member

@kba kba commented Oct 9, 2020

No description provided.

@kba kba requested a review from bertsky October 9, 2020 11:51
Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am just seeing this now: We actually have to rewrite the whole function. Up above, we filter by MIME type alone. So whenever (for some list of pageIds) there is even a single page with PAGE annotation, only that page survives.

What we need instead:

  • find_all_files filtering MIME for PAGE or image/*
  • aggregate results by page ID, and iterate:
  • if the page has a PAGE hit, keep only that
  • else if it has a single hit, keep it
  • else raise

@bertsky
Copy link
Collaborator

bertsky commented Nov 5, 2020

Superseded by #635

@bertsky bertsky closed this Nov 5, 2020
@kba kba deleted the fix-622 branch November 16, 2020 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants