05 Dec 18:28

johnkerl

e71ee1d

Minor feature enhancements, and portability

Portability (affecting the CSV-RFC reader) for the Debian packaging request: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800074. The latter greatly increases the number of platforms on which Miller has been validated.
mlr decimate: http://johnkerl.org/miller/doc/reference.html#decimate
Integer-preservation feature for mlr top and mlr stats1 with percentiles: If inputs are integers then corresponding outputs will be so as well (unless -F, which forces all-float output).
mlr histogram now has a --auto option for autocomputing lower and upper limits: http://johnkerl.org/miller/doc/reference.html#histogram
mlr uniq and mlr count-distinct now have a -n flag to show only the counts of distinct values, rather than listing all distinct values: http://johnkerl.org/miller/doc/reference.html#uniq http://johnkerl.org/miller/doc/reference.html#count-distinct
The strlen function correctly handles UTF-8 string data.

Assets 5

27 Nov 02:46

johnkerl

v3.0.1

3620ae7

Allow scientific notation in DSL literals; mlr bar --auto

Miller has always supported scientific notation in field values, e.g x=1e6. However, it had never supported scientific notation in DSL literals, e.g. mlr put '$y = $x + 1e6. This release fixes that.
Additionally, mlr bar now has a ---auto flag which holds all records in memory and computes limits from the data, so you don't have to compute them separately and pass them in via --lo and --hi.

Assets 5

24 Nov 03:46

johnkerl

v3.0.0

3741d29

Integer and float arithmetic, improved documentation, minor feature enhancements

Integer/float arithmetic

The key feature of the 3.0.0 release, and the reason for the major version increment, is that previously all numbers were scanned into mlr put and mlr filter functions as floating-point -- then, only recast to integer as necessary for integer operations. Since IEEE doubles have 53 bits of precision (52 mantissa bits along with implicit leading one) while 64-bit integers have 64, this meant that full 64-bit integer signficance could not be passed through Miller functions.

As of the 3.0.0 release, numbers in Miller are int (64 bits) or float (double-precision). Numbers scannable as integers are treated as integers. The sum, difference, and product of two integers is another integer -- except when overflow would occur, at which point a floating-point result is produced. Integer division is pythonic, namely, 7/2 is 3.5, and 7//2 is 3. Mixed integer/float operations produce float. Bitwise operators are now supported.

You now have more control over arithmetic, not less. The only real compatibility change is that some numbers will now be printing like 123 rather than 123.0000.

For full details please see http://johnkerl.org/miller/doc/reference.html#Arithmetic.

New functions for filter and put

Since integers are now fully supported in mlr put and mlr filter, it is now possible to have the bitwise operators | ^ & << >>. These operate on 64-bit integers and produce 64-bit-integer results.
Modular arithmetic is implemented by madd, msub, mmul, and mexp.
urandint and urand32 are in addition to the existing urand.
sgn complements abs.
strftime and strptime are generalizations of sec2gmt and gmt2sec. There are pass-throughs to system strftime and strptime; see your local manpages for available time-formatting options.
Please see http://johnkerl.org/miller/doc/reference.html#Functions_for_filter_and_put for more information.

Verbs

mlr grep: http://johnkerl.org/miller/doc/reference.html#grep
mlr cat -n option: http://johnkerl.org/miller/doc/reference.html#cat
mlr stats1 skewness and mlr stats1 kurtosis: http://johnkerl.org/miller/doc/reference.html#stats1
mlr bar allows for some simple terminal-level visualization: http://johnkerl.org/miller/doc/reference.html#bar
mlr join now has full support for heterogeneous data: records lacking all the join keys are treated the same as any other left-unpaired or right-unpaired records. This was tracked on issue #82.

I/O options

mlr --xvright for XTAB output
mlr --headerless-csv-output for CSV/CSV-lite output

Documentation

The mlr.1 manpage is now autogenerated.
There is now documentation on operator precedence and function semantics.
HTML pages at http://johnkerl.org/miller/doc/ are now PDF-renderable.
Per-release documents are available at http://johnkerl.org/miller/doc/release-docs.html. (The documents at http://johnkerl.org/miller/doc/ have always tracked head, and they continue to do so.)

Assets 5

27 Oct 01:55

johnkerl

v2.3.2

5921459

Iterative stats, exclude-filter, implicit-CSV-header, and other features

mlr stats1 and stats2 now support a -s feature in which means, linear regressions, etc. evolve record-by-record as new records appear over time. This is particularly useful in tail -f contexts. See also http://johnkerl.org/miller/doc/reference.html#stats1 and http://johnkerl.org/miller/doc/reference.html#stats2.
mlr filter now supports a -x flag to negate the sense of the filter: instead of editing logic expressions e.g. from mlr filter '$x < 10 || $x > 20' to mlr filter '$x >= 10 && $x <= 20', you can simply do mlr filter -x '$x < 10 || $x > 20'. See also http://johnkerl.org/miller/doc/reference.html#filter.
In the event a CSV file lacks header lines, you can use mlr --implicit-csv-header to add positional header 1,2,3,.... You can also convert those to desired text using mlr label. See also http://johnkerl.org/miller/doc/reference.html#label.
Heterogeneity support is improved for sort, stats1, stats2, step, head, tail, top, sample, uniq, and count-distinct. See also #79.
mlr stats2 now has a logistic-regression feature, but I recommend treating it as experimental until some numerical-stability issues involving my naïve Newton-Raphson solver are worked out -- namely, it doesn't converge in all cases.

http://johnkerl.org/miller/releases/miller-2.3.2/doc/

Assets 5

19 Oct 03:49

johnkerl

v2.3.1

1600bc6

Bug fix for mlr top -a

Memory management was incorrect in mlr top -a.

Assets 4

17 Oct 23:05

johnkerl

v2.3.0

4cdfbf6

Regex support, gsub, reservoir sampling, iterative stats, and other features

Regex support

gsub function

In addition to the existing sub function: replace-all in addition to replace-once. Includes regex support.
http://johnkerl.org/miller/doc/reference.html#Functions_for_filter_and_put

Reservoir sampling

http://johnkerl.org/miller/doc/reference.html#sample

Iterative stats1/stats2

Use mlr stats1 -s ... or mlr stats2 -s ... to print averages, min/max, correlation, etc. on every record. Useful in tail -f contexts when you want to see statistics evolving as the data evolve in time.

http://johnkerl.org/miller/doc/reference.html#stats1
http://johnkerl.org/miller/doc/reference.html#stats2

Minor

Initial delta for mlr step -a delta is now 0, matching initial 1 for mlr step -a ratio
Usage messages consistently go to stdout when asked for via -h, and stderr in case of command-line syntax errors
Online help is confined to 80-character column width, except for mlr -f which is all single-line greppable
Header/data length mismatch error messages for CSV/CSV-lite now include file/line context

Assets 4

24 Sep 02:24

johnkerl

v2.2.1

3035d6e

Autoconfig support

Documentation at http://johnkerl.org/miller/doc/build.html

Resolves #9

Most of the work here due to @0-wiz-0

http://johnkerl.org/miller/releases/miller-2.2.1/doc/

Assets 5

21 Sep 01:34

johnkerl

v2.2.0

6299c14

Multi-character RS,FS,PS

You can process CRLF-terminated DKVP files with mlr --dkvp --rs crlf.
You can process LF-terminated CSV files with mlr --csv --rs lf.
You can process TSV using mlr --fs tab; you can convert TSV to CSV using mlr --ifs tab --ofs comma.
Along with many more possibilities.
Please see mlr -h for more information.

There is one minor, backward-incompatible change which I felt not worth calling this 3.0.0: default field separator for NIDX format is now space, not comma.

Assets 5

08 Sep 02:58

johnkerl

v2.1.4

bc4d076

Improved read performance for RFC4180 CSV

Resolves #51

RFC-compliant CSV input is now about 60% faster than at initial feature release (https://github.com/johnkerl/miller/releases/tag/v2.0.0). It remains about 50% slower than CSV-lite.

Assets 5

06 Sep 02:16

johnkerl

v2.1.3

e643d64

Reduce tar-file size

Addresses #61

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integer/float arithmetic

New functions for filter and put

Verbs

I/O options

Documentation

Regex support

gsub function

Reservoir sampling

Iterative stats1/stats2

Minor

Releases: johnkerl/miller

Minor feature enhancements, and portability

Allow scientific notation in DSL literals; mlr bar --auto

Integer and float arithmetic, improved documentation, minor feature enhancements

Integer/float arithmetic

New functions for filter and put

Verbs

I/O options

Documentation

Iterative stats, exclude-filter, implicit-CSV-header, and other features

Bug fix for mlr top -a

Regex support, gsub, reservoir sampling, iterative stats, and other features

Regex support

gsub function

Reservoir sampling

Iterative stats1/stats2

Minor

Autoconfig support

Multi-character RS,FS,PS

Improved read performance for RFC4180 CSV

Reduce tar-file size