Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PDFStreamEngine.java #27

Open
wants to merge 5,617 commits into
base: trunk
Choose a base branch
from
Open

Update PDFStreamEngine.java #27

wants to merge 5,617 commits into from

Conversation

royguo
Copy link

@royguo royguo commented Oct 3, 2016

No need to allocate a new ArrayList here, reduce text extraction time from 16 seconds to 14 seconds on a 4.2M pdf.

THausherr and others added 30 commits June 21, 2016 16:35
…er from twelvemonkeys; add check for orientation

git-svn-id: https://svn.apache.org/repos/asf/pdfbox/trunk@1749936 13f79535-47bb-0310-9956-ffa450edef68
THausherr and others added 22 commits September 11, 2016 12:23
…d, as suggested by Lorenz Pahl

git-svn-id: https://svn.apache.org/repos/asf/pdfbox/trunk@1760963 13f79535-47bb-0310-9956-ffa450edef68
There's no need to allocate new ArrayList in `processStreamOperators`. In my test case of a `4.2M` pdf, text extraction reduce from 16 seconds to 14 seconds.
@THausherr
Copy link
Contributor

THausherr commented Oct 3, 2016

This is a read only mirror. Please close this and open an issue in JIRA.
https://issues.apache.org/jira/browse/PDFBOX

@THausherr
Copy link
Contributor

Of course every speed increase is welcome, but this change is one to be discussed with "the rest of the gang" - what is if one of the processOperator methods keeps the argument list? If not now, maybe at a later time? Your change would pull it under the feet.

@royguo
Copy link
Author

royguo commented Oct 4, 2016

@THausherr What do you mean by keep the argument list ? I assume you mean someone want to keep the elements in arguments inside processOperator, well, in that case, the clear method only remove elements out of arguments, not destroy them, so if some one keeps reference of the elements, it will still works.

@skjolber
Copy link

Any progress on this? The users of the passed array must make a copy of the arguments array.

@THausherr
Copy link
Contributor

No progress, this is a read only mirror. I told to create an issue in JIRA. I won't create it myself because I'm not persuaded by this. If "The users of the passed array must make a copy of the arguments array." then where would be the speed gain?

@skjolber
Copy link

skjolber commented Jul 1, 2017

I should have written: The users of the passed array, which have to keep a list of the arguments, must make a copy of the arguments array. However I agree, this kind of optimalization must be investigated further, so that there is no unexpected side-effects.

I've created #38 which investigates whether the ArrayList is in use after the call to processor. First impression is that this is not the case, and that the optimalization is possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants