Skip to content

External merge sort for java8 streams - Sorting large streams of data without having to keep all elements in memory.

License

Notifications You must be signed in to change notification settings

jhorstmann/extsortcollect

Repository files navigation

External merge sort for java8 streams

Sorting large streams of data without having to keep all elements in memory.

Example usage

ExternalSortCollectors.Serializer<T> serializer = ...
Comparator<T> comparator = ...

ExternalSortCollectors.Configuration<T> configuration = ExternalSortCollectors.configuration(serializer)
        .withComparator(comparator)
        .withInternalSortMaxItems(100_000)
        .withMaxRecordSize(1024)
        .withWriteBufferSize(64 * 4096)
        .build();

Stream<T> stream = ...

stream.collect(ExternalSortCollectors.externalSort(configuration))
        .skip(200_000)
        .limit(100)
        .foreach(record -> {
            ...
        });

Comparison with exmeso

  • Based on NIO buffers instead of InputStream/OutputStream, this imposes a maximum record size which can be configured
  • Between same speed and 1.5 times as fast, depending on CPU/IO/data size
  • Support for parallel sorting gives a nice speed-up on multicore machines
  • Sort is stable (See issue #3 in exmeso
  • Temporary sorted chunks are stored in one large file instead of one file per chunk

About

External merge sort for java8 streams - Sorting large streams of data without having to keep all elements in memory.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages