Skip to content

Latest commit

 

History

History
executable file
·
356 lines (273 loc) · 9.7 KB

grepRange

File metadata and controls

executable file
·
356 lines (273 loc) · 9.7 KB

Usage

greprange [options] [files]

Extract ranges of lines from a file(s), where the range starts and ends are indicated by (Perl) regular expressions.

For example:

greprange --start '<abstract>' --end '</abstract>'

would pull out all the lines tagged as an XML 'abstract' element(s).

The regexes do not have to exhause the lines they match on; if you want that, give them as ^...$. There are options to control whether the boundary lines are included entirely, excluded entirely, or included only up to the matched part.

Options

(most are like for grep):

  • --countOnly

    Just count hits, don't display them.

  • --end expr

    Expression to end at.

  • --filename

    Prefix filename(s) to matched lines.

  • -i

    Ignore case.

  • --includeEnd

    Include the matching end line itself.

  • --invert-match (or -v)

    Print lines *outside* of matched ranges.

  • --includeStart

    Include the matching start line itself.

  • --label l

    Display l as the 'filename' for hits on STDIN.

  • --line-number (or -n)

    Prefix displayed lines with line numbers.

  • --maxLinesPerRange n

    Don't find ranges where the start and end are separated by more than this many lines. Warning: It's not yet defined what happens if there are additional starts found, within a pending range -- probably it should find the shortest range, or the longest one under the max length, but I haven't addressed that case yet.

  • --maxRanges n

    Stop after n matching ranges have been found.

  • --only-matching or -o

    Include only the inward part of the start/end lines.

  • -q or --quiet

    Suppress most messages.

  • --sep s

    Print this after each matched range. Default: none ("").

  • --start expr

    Expression to start at.

  • -s

    Suppress messages about missing files.

  • --verbose

    Add more detailed messages.

  • --version

    Display version info and exit.

Related commands

You can do this with a bit of awk if you remember how. See https://stackoverflow.com/questions/38972736. For example: awk '/PAT1/{flag=1} flag; /PAT2/{flag=0}' myFile

My grepData and body --iterate scripts do related extractions.

Known bugs and limitations

-h and -o and --color are not finished.

There are no grep-like options for: context: -A -B -C; regex types: -E -e -F -G -w -x; unusual file options: -a -d -D -I -U; other: -f -l -L -r and -z.

History

2008-02-01: Written by Steven J. DeRose.
2008-02-11 sjd: Add -c, -i, --includeEnd, --includeStart, -n, -v(invert-match).
Start multi-file support, -h -H -o.
2015-08-20: Clean up.
2020-08-31: New layout.
2021-04-11: Fix file handling. Implement --only-matching. Show awk alternative.
2021-09-10: Rename --istart and --iend to --includeStart and --includeEnd.
Add --fpat and --lpat as synonyms for --start and --end (to match `body` script).
Fix several major logic bugs, clean up code.  Add --sep.
2021-09-22: Clean up doc. Add --filename and unify prefix-handling.

To do

Integrate into C<body> or C<grepData>.
-b (byte-offset), -u (unix-byte-offsets) (strips CR).
Port to Python and integrate PowerWalk.
grep-like options? Port and switch to PowerWalk for most. files-with[out], --null,
    --only-matching, -x/--line-regexp, -w/--word-regexp
Finish --color (incl. responding to $GREP_COLOR)
Integrate with grepData, esp. for PowerWalk and context args.
Grab options from $GREP_OPTIONS or maybe $GREPRANGE_OPTIONS.
Generate return code: 0: something found; 1: none found; >1: error.

Rights

Copyright 2008-02-01 by Steven J. DeRose. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. For further information on this license, see https://creativecommons.org/licenses/by-sa/3.0.

For the most recent version, see http://www.derose.net/steve/utilities or https://github.com/sderose.