Skip to content

Ports, bugfixes, and keystroke-savers

Compare
Choose a tag to compare
@johnkerl johnkerl released this 17 Mar 03:24

Ports

Features

Bugfixes

  • A bug regarding optional regex-pattern groups was fixed in #277.
  • As of #294 you can now specify --implicit-csv-header for the join-file in mlr join.
  • A bug with spaces in XTAB-file values was fixed on #296.
  • A bug with missing final newline for XTAB-formatted files using MMAP files was fixed on #301.

Documentation

Note

Support for mmap mode has been entirely discontinued. This is an invisible change and should not affect you at all. For anyone interested in lower-level details, though, the summary is as follows:

  • For an incremental performance gain (perhaps 10-20% run time at most, but see below), within the C source code one can use the mmap system call to access input files via pointer arithmetic rather than malloc-and-memcopy using stdio.
  • However mmap is not available when reading from standard input -- it cannot be memory-mapped.
  • This means all file-format readers are implemented twice within the Miller source code.
  • While I try to regression-test Miller thoroughly, running all canned tests through mmap and stdio mode, I've nonetheless found my mmap implementations liable to corner-cases which I miss but users find: for example #29, #102, and #296.
  • As tracked on #160, various operating systems do not release mmapped pages after use as one might intuit, meaning that for large files and/or large numbers of files, I've for a long time now needed to have Miller opt out of mmap usage for precisely those cases which most need the performance gain: see #160, #181, and #256.
  • Additionally, mmap is not used at all for Windows/MSYS2 so there is nothing to lose there.

For these reasons, keeping mmap mode isn't worth the development overhead.

As of release 5.7.0, the mlr executable will still accept the --mmap and --no-mmap command-line flags as no-ops, for backward compatibility.

The caveat for you is that for everyday small files, the default was previously mmap mode and is now stdio (except mlr ... < filename or ... | mlr ... which have always used stdio). There is the off chance that this will newly reveal an old, latent bug or two somewhere.

I've re-run regressions in valgrind mode to aggressively catch any errors, but, please let me know ASAP via GitHub issue of any unexpected behavior in 5.7.0.