Skip to content

Commit 5dab7a2

Browse files
committed
Improved file read time
1 parent 3f9f33d commit 5dab7a2

File tree

2 files changed

+18
-9
lines changed

2 files changed

+18
-9
lines changed

CHANGES

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@ jsoup changelog
99

1010
* Improved the performance of Element.html() by 1.7x
1111

12+
* Improved file read time by 2x, giving around a 10% speed improvement to file parses.
13+
<https://github.com/jhy/jsoup/issues/248>
14+
1215
* Tightened the scope of what characters are escaped in attributes and textnodes, to align with the spec. Also, when
1316
using the extended escape entities map, only escape a character if the current output charset does not support it.
1417
This produces smaller, more legible HTML, with greated control over the output (by setting charset and escape mode).

src/main/java/org/jsoup/helper/DataUtil.java

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,8 @@ private DataUtil() {}
3232
* @throws IOException on IO error
3333
*/
3434
public static Document load(File in, String charsetName, String baseUri) throws IOException {
35-
FileInputStream inStream = null;
36-
try {
37-
inStream = new FileInputStream(in);
38-
ByteBuffer byteData = readToByteBuffer(inStream);
39-
return parseByteData(byteData, charsetName, baseUri, Parser.htmlParser());
40-
} finally {
41-
if (inStream != null)
42-
inStream.close();
43-
}
35+
ByteBuffer byteData = readFileToByteBuffer(in);
36+
return parseByteData(byteData, charsetName, baseUri, Parser.htmlParser());
4437
}
4538

4639
/**
@@ -160,6 +153,19 @@ static ByteBuffer readToByteBuffer(InputStream inStream) throws IOException {
160153
return readToByteBuffer(inStream, 0);
161154
}
162155

156+
static ByteBuffer readFileToByteBuffer(File file) throws IOException {
157+
RandomAccessFile randomAccessFile = null;
158+
try {
159+
randomAccessFile = new RandomAccessFile(file, "r");
160+
byte[] bytes = new byte[(int) randomAccessFile.length()];
161+
randomAccessFile.readFully(bytes);
162+
return ByteBuffer.wrap(bytes);
163+
} finally {
164+
if (randomAccessFile != null)
165+
randomAccessFile.close();
166+
}
167+
}
168+
163169
/**
164170
* Parse out a charset from a content type header. If the charset is not supported, returns null (so the default
165171
* will kick in.)

0 commit comments

Comments
 (0)