8358533: Improve performance of java.io.Reader.readAllLines #25863

bplb · 2025-06-18T00:04:37Z

Replaces the implementation readAllCharsAsString().lines().toList() with reading into a temporary char array which is then processed to detect line terminators and copy non-terminating characters into strings which are added to the list.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8358533: Improve performance of java.io.Reader.readAllLines (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25863/head:pull/25863
$ git checkout pull/25863

Update a local copy of the PR:
$ git checkout pull/25863
$ git pull https://git.openjdk.org/jdk.git pull/25863/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25863

View PR using the GUI difftool:
$ git pr show -t 25863

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25863.diff

Using Webrev

Link to Webrev Comment

bplb · 2025-06-18T00:05:17Z

The throughput of the implementation as measured by the included benchmark appears to hover around 13% greater than that of the existing method. The updated method should also have a smaller memory footprint for streams of non-trivial length as it does not first create a single intermediate String containing all lines in the stream. Instead it uses a char array of size 8192 and a StringBuilder whose maximum length will be the length of the longest line in the input.

bridgekeeper · 2025-06-18T00:05:28Z

👋 Welcome back bpb! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-06-18T00:05:58Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-06-18T00:06:37Z

@bplb The following label will be automatically applied to this pull request:

core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-06-18T00:10:07Z

Webrevs

00: Full (c2420432)

RogerRiggs · 2025-06-18T00:53:51Z

src/java.base/share/classes/java/io/Reader.java

+                    if (c == '\r' || c == '\n')
+                        break;
+                    term++;


It might be worth adding a test of unconventional sequences or \r and \n, including \r\r and \n\n, \r.
The current ReadAll test cover the conventional sequences on Linux and Windows.

I agree. I was intending to follow up on @jaikiran's comment, probably in an update to this request.

I think we should treat "\r\n" as a single line terminator? for example

"hello\r\nworld".lines().forEach(line -> out.format("[%s]\n", line));
=>
[hello]
[world]

instead of (the current impl)

[hello]
[]
[world]

or I misread the impl?

I think we should treat "\r\n" as a single line terminator?

You are correct: that needs to be fixed:

jshell> Reader r = new StringReader("hello\r\nworld") r ==> java.io.StringReader@480bdb19 jshell> r.readAllLines() $3 ==> [hello, , world]

Thanks for the catch!

Scanner seems to scan for even more characters:

jdk/src/java.base/share/classes/java/util/Scanner.java

Line 490 in c4fb00a

private static final String LINE_SEPARATOR_PATTERN =

Would it make sense to resemble this? Would it make sense to simply use Scanner directly? 🤔

wenshao · 2025-06-18T02:12:50Z

src/java.base/share/classes/java/io/Reader.java

+                if (pos >= n) {
+                    // Buffer content consumed so reload it.
+                    if ((n = read(cb, 0, cb.length)) < 0) {
+                        eos = eol = true;


Suggested change

eos = eol = true;

eos = true;

The local variable eol assignment here is not used and can be removed.

wenshao · 2025-06-18T02:27:38Z

src/java.base/share/classes/java/io/Reader.java

+            }
+
+            eol = false;


Suggested change

}

eol = false;

}

Same as above, the local variable eol is not used after being assigned and can be removed.

liach · 2025-06-18T02:25:03Z

src/java.base/share/classes/java/io/Reader.java

-        return readAllCharsAsString().lines().toList();
+        char[] cb = new char[TRANSFER_BUFFER_SIZE];
+        int pos = 0;
+        List<String> lines = new ArrayList<String>();


Suggested change

List<String> lines = new ArrayList<String>();

List<String> lines = new ArrayList<>();

liach · 2025-06-18T02:26:48Z

src/java.base/share/classes/java/io/Reader.java

+        int pos = 0;
+        List<String> lines = new ArrayList<String>();
+
+        StringBuilder sb = new StringBuilder(82);


Is there a reason for this pre-allocation? If the whole content is smaller than 8192 in size, this allocation would be redundant because we are going through the string constructor path.

Is there a reason for this pre-allocation?

What would you suggest? Start with a smaller allocation and increase it if needed? There is no possibility of knowing the length of the stream.

As this PR explicitly targets performance and as the aim of this method is to keep all content in-memory anyways, I wonder if it would be acceptable and even faster to pre-allocate new StringBuilder(TRANSFER_BUFFER_SIZE)? In the end, this allocation is just temporary.

My suggestion is to call new StringBuilder(0) as it is possible this is completely unused because we always hit the eol && sb.length() == 0 path below.

The change is motivated by performance, but there will be many inputs that are less than the transfer buffer size and those will not use the StringBuilder, so creating it before it is needed could be avoided.
When a partial line is left in the transfer buffer, copy it to the beginning of the buffer and read more characters for the remaining size of the buffer. It would save some copying into and out of the SB.
You might still need a fallback for really long lines (> transfer buffer size), but that might be more easily handled by reallocating the transfer buffer to make it larger.

resizing/newCapacity is always expensive and tricky, string builder included. so maybe we should decide if 'long lines' (> transfer buffer size) is the goal of this pr. if not, it might be reasonable/make sense (???) to simply go with "string" + the built-in string concatenation -> we don't care the scenario that most of the 'lines' > buffer.size. i do agree we probably want to avoid paying the cost of copying in & out of the sb, but tweaking the transfer buffer resizing might also be tricky and potentially out of the scope as well. yes, it's always a trade off.

src/java.base/share/classes/java/io/Reader.java

wenshao · 2025-06-18T09:20:12Z

If we want better performance, we should go a step further and overload the readAllLines method in the Reader implementation class.

For example, in the most commonly used InputStreamReader, overload readAllLines through StreamDecoder and make special optimizations for UTF8/ISO_8859_1 encoding.

In StringReader, special overload methods can also be used for optimization.

bplb · 2025-06-18T14:34:10Z

If we want better performance, we should go a step further and overload the readAllLines method in the Reader implementation class.

Perhaps, but not in this request. A separate issue should be filed and addressed subsequently.

mkarg · 2025-06-19T10:53:14Z

src/java.base/share/classes/java/io/Reader.java

+        int pos = 0;
+        List<String> lines = new ArrayList<String>();
+
+        StringBuilder sb = new StringBuilder(82);


As this PR explicitly targets performance and as the aim of this method is to keep all content in-memory anyways, I wonder if it would be acceptable and even faster to pre-allocate new StringBuilder(TRANSFER_BUFFER_SIZE)? In the end, this allocation is just temporary.

mkarg · 2025-06-19T10:57:52Z

src/java.base/share/classes/java/io/Reader.java

+                    if (c == '\r' || c == '\n')
+                        break;
+                    term++;


Scanner seems to scan for even more characters:

jdk/src/java.base/share/classes/java/util/Scanner.java

Line 490 in c4fb00a

private static final String LINE_SEPARATOR_PATTERN =

Would it make sense to resemble this? Would it make sense to simply use Scanner directly? 🤔

mkarg · 2025-06-19T10:59:14Z

src/java.base/share/classes/java/io/Reader.java

+                    // Current position is terminator so skip it.
+                    pos++;
+                } else { // term > pos
+                    if (eol && sb.length() == 0) {


Is there a reason for sb.length() == 0 instead of sb.isEmpty()?

mkarg · 2025-06-19T11:08:32Z

src/java.base/share/classes/java/io/Reader.java

+            eol = false;
+        }
+
+        return lines;


Do we really want to return a mutable ArrayList here? In earlier discussions about this very API I was told that it deliberately returns String instead of CharSequence due to intended immutability, even if that potentially implied slower performance. Following this logic, it would be just straightforward to return Collections.unmodifiableList(lines); here. 🤔

mkarg · 2025-06-19T11:13:37Z

src/java.base/share/classes/java/io/Reader.java

@@ -499,7 +547,13 @@ public List<String> readAllLines() throws IOException {
     * @since 25
     */
    public String readAllAsString() throws IOException {
-        return readAllCharsAsString();
+        StringBuilder result = new StringBuilder();
+        char[] cbuf = new char[TRANSFER_BUFFER_SIZE];


As this PR explicitly targets performance and as the aim of this method is to keep all content in-memory anyways, I wonder if it would be acceptable and even faster to pre-allocate new StringBuilder(TRANSFER_BUFFER_SIZE)? In the end, this allocation is just temporary.

8358533: Improve performance of java.io.Reader.readAllLines

c242043

openjdk bot added the rfr Pull request is ready for review label Jun 18, 2025

openjdk bot added the core-libs [email protected] label Jun 18, 2025

RogerRiggs reviewed Jun 18, 2025

View reviewed changes

wenshao reviewed Jun 18, 2025

View reviewed changes

liach reviewed Jun 18, 2025

View reviewed changes

mkarg reviewed Jun 19, 2025

View reviewed changes

	List<String> lines = new ArrayList<String>();
	List<String> lines = new ArrayList<>();

8358533: Improve performance of java.io.Reader.readAllLines #25863

Are you sure you want to change the base?

8358533: Improve performance of java.io.Reader.readAllLines #25863

Conversation

bplb commented Jun 18, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bplb commented Jun 18, 2025

Uh oh!

bridgekeeper bot commented Jun 18, 2025

Uh oh!

openjdk bot commented Jun 18, 2025

Uh oh!

openjdk bot commented Jun 18, 2025

Uh oh!

mlbridge bot commented Jun 18, 2025

Webrevs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuemingshen-oracle Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuemingshen-oracle Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wenshao commented Jun 18, 2025

Uh oh!

bplb commented Jun 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bplb commented Jun 18, 2025 •

edited by openjdk bot

Loading

xuemingshen-oracle Jun 18, 2025 •

edited

Loading

xuemingshen-oracle Jun 20, 2025 •

edited

Loading