-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8358533: Improve performance of java.io.Reader.readAllLines #25863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -27,6 +27,7 @@ | |||||||||
|
||||||||||
import java.nio.CharBuffer; | ||||||||||
import java.nio.ReadOnlyBufferException; | ||||||||||
import java.util.ArrayList; | ||||||||||
import java.util.List; | ||||||||||
import java.util.Objects; | ||||||||||
|
||||||||||
|
@@ -397,16 +398,6 @@ public int read(char[] cbuf) throws IOException { | |||||||||
*/ | ||||||||||
public abstract int read(char[] cbuf, int off, int len) throws IOException; | ||||||||||
|
||||||||||
private String readAllCharsAsString() throws IOException { | ||||||||||
StringBuilder result = new StringBuilder(); | ||||||||||
char[] cbuf = new char[TRANSFER_BUFFER_SIZE]; | ||||||||||
int nread; | ||||||||||
while ((nread = read(cbuf, 0, cbuf.length)) != -1) { | ||||||||||
result.append(cbuf, 0, nread); | ||||||||||
} | ||||||||||
return result.toString(); | ||||||||||
} | ||||||||||
|
||||||||||
/** | ||||||||||
* Reads all remaining characters as lines of text. This method blocks until | ||||||||||
* all remaining characters have been read and end of stream is detected, | ||||||||||
|
@@ -457,7 +448,64 @@ private String readAllCharsAsString() throws IOException { | |||||||||
* @since 25 | ||||||||||
*/ | ||||||||||
public List<String> readAllLines() throws IOException { | ||||||||||
return readAllCharsAsString().lines().toList(); | ||||||||||
char[] cb = new char[TRANSFER_BUFFER_SIZE]; | ||||||||||
int pos = 0; | ||||||||||
List<String> lines = new ArrayList<String>(); | ||||||||||
|
||||||||||
StringBuilder sb = new StringBuilder(82); | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason for this pre-allocation? If the whole content is smaller than 8192 in size, this allocation would be redundant because we are going through the string constructor path. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What would you suggest? Start with a smaller allocation and increase it if needed? There is no possibility of knowing the length of the stream. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As this PR explicitly targets performance and as the aim of this method is to keep all content in-memory anyways, I wonder if it would be acceptable and even faster to pre-allocate There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My suggestion is to call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The change is motivated by performance, but there will be many inputs that are less than the transfer buffer size and those will not use the StringBuilder, so creating it before it is needed could be avoided. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. resizing/newCapacity is always expensive and tricky, string builder included. so maybe we should decide if 'long lines' (> transfer buffer size) is the goal of this pr. if not, it might be reasonable/make sense (???) to simply go with "string" + the built-in string concatenation -> we don't care the scenario that most of the 'lines' > buffer.size. i do agree we probably want to avoid paying the cost of copying in & out of the sb, but tweaking the transfer buffer resizing might also be tricky and potentially out of the scope as well. yes, it's always a trade off. |
||||||||||
int n = read(cb, 0, cb.length); | ||||||||||
boolean eos = (n == -1); | ||||||||||
while (!eos) { | ||||||||||
boolean eol = false; | ||||||||||
boolean stringAdded = false; | ||||||||||
while (!eol) { | ||||||||||
// Find the next line terminator. If none is found, | ||||||||||
// "term" will equal "n". | ||||||||||
int term = pos; | ||||||||||
while (term < n) { | ||||||||||
char c = cb[term]; | ||||||||||
if (c == '\r' || c == '\n') | ||||||||||
break; | ||||||||||
term++; | ||||||||||
Comment on lines
+467
to
+469
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be worth adding a test of unconventional sequences or \r and \n, including \r\r and \n\n, \r. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should treat "\r\n" as a single line terminator? for example "hello\r\nworld".lines().forEach(line -> out.format("[%s]\n", line)); instead of (the current impl) [hello] or I misread the impl? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You are correct: that needs to be fixed:
Thanks for the catch! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Would it make sense to resemble this? Would it make sense to simply use |
||||||||||
} | ||||||||||
|
||||||||||
// Terminator found so at EOL. | ||||||||||
if (term < n) | ||||||||||
eol = true; | ||||||||||
|
||||||||||
if (term == pos) { | ||||||||||
// Current position is terminator so skip it. | ||||||||||
pos++; | ||||||||||
} else { // term > pos | ||||||||||
if (eol && sb.length() == 0) { | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason for |
||||||||||
// Create and add a string to avoid the StringBuilder. | ||||||||||
lines.add(new String(cb, pos, term - pos)); | ||||||||||
stringAdded = true; | ||||||||||
} else | ||||||||||
sb.append(cb, pos, term - pos); | ||||||||||
pos = term + 1; | ||||||||||
} | ||||||||||
|
||||||||||
if (pos >= n) { | ||||||||||
// Buffer content consumed so reload it. | ||||||||||
if ((n = read(cb, 0, cb.length)) < 0) { | ||||||||||
eos = eol = true; | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
The local variable eol assignment here is not used and can be removed. |
||||||||||
break; | ||||||||||
} | ||||||||||
pos = 0; | ||||||||||
} | ||||||||||
} | ||||||||||
|
||||||||||
if (!stringAdded) { | ||||||||||
bplb marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
// Derive a string and add it to the list. | ||||||||||
lines.add(sb.toString()); | ||||||||||
sb.setLength(0); | ||||||||||
} | ||||||||||
|
||||||||||
eol = false; | ||||||||||
Comment on lines
+503
to
+505
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Same as above, the local variable eol is not used after being assigned and can be removed. |
||||||||||
} | ||||||||||
|
||||||||||
return lines; | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really want to return a mutable |
||||||||||
} | ||||||||||
|
||||||||||
/** | ||||||||||
|
@@ -499,7 +547,13 @@ public List<String> readAllLines() throws IOException { | |||||||||
* @since 25 | ||||||||||
*/ | ||||||||||
public String readAllAsString() throws IOException { | ||||||||||
return readAllCharsAsString(); | ||||||||||
StringBuilder result = new StringBuilder(); | ||||||||||
char[] cbuf = new char[TRANSFER_BUFFER_SIZE]; | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As this PR explicitly targets performance and as the aim of this method is to keep all content in-memory anyways, I wonder if it would be acceptable and even faster to pre-allocate |
||||||||||
int nread; | ||||||||||
while ((nread = read(cbuf, 0, cbuf.length)) != -1) { | ||||||||||
result.append(cbuf, 0, nread); | ||||||||||
} | ||||||||||
return result.toString(); | ||||||||||
} | ||||||||||
|
||||||||||
/** Maximum skip-buffer size */ | ||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
/* | ||
* Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. | ||
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. | ||
* | ||
* This code is free software; you can redistribute it and/or modify it | ||
* under the terms of the GNU General Public License version 2 only, as | ||
* published by the Free Software Foundation. | ||
* | ||
* This code is distributed in the hope that it will be useful, but WITHOUT | ||
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or | ||
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License | ||
* version 2 for more details (a copy is included in the LICENSE file that | ||
* accompanied this code). | ||
* | ||
* You should have received a copy of the GNU General Public License version | ||
* 2 along with this work; if not, write to the Free Software Foundation, | ||
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. | ||
* | ||
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA | ||
* or visit www.oracle.com if you need additional information or have any | ||
* questions. | ||
*/ | ||
package org.openjdk.bench.java.io; | ||
|
||
import java.io.CharArrayReader; | ||
import java.io.IOException; | ||
import java.io.Reader; | ||
import java.util.List; | ||
import java.util.Random; | ||
|
||
import org.openjdk.jmh.annotations.Benchmark; | ||
import org.openjdk.jmh.annotations.Scope; | ||
import org.openjdk.jmh.annotations.Setup; | ||
import org.openjdk.jmh.annotations.State; | ||
|
||
@State(Scope.Benchmark) | ||
public class ReaderReadAllLines { | ||
|
||
private char[] chars = null; | ||
|
||
@Setup | ||
public void setup() throws IOException { | ||
final int len = 128_000; | ||
chars = new char[len]; | ||
Random rnd = new Random(System.nanoTime()); | ||
int off = 0; | ||
while (off < len) { | ||
int lineLen = 40 + rnd.nextInt(40); | ||
if (lineLen > len - off) { | ||
off = len; | ||
} else { | ||
chars[off + lineLen] = '\n'; | ||
off += lineLen; | ||
} | ||
} | ||
} | ||
|
||
@Benchmark | ||
public List<String> readAllLines() throws IOException { | ||
List<String> lines; | ||
try (Reader reader = new CharArrayReader(chars);) { | ||
lines = reader.readAllLines(); | ||
} | ||
return lines; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.