greprange [options] [files]
Extract ranges of lines from a file(s), where the range starts and ends are indicated by (Perl) regular expressions.
For example:
greprange --start '<abstract>' --end '</abstract>'
would pull out all the lines tagged as an XML 'abstract' element(s).
The regexes do not have to exhause the lines they match on; if you want that, give them as ^...$. There are options to control whether the boundary lines are included entirely, excluded entirely, or included only up to the matched part.
(most are like for grep
):
--countOnly
Just count hits, don't display them.
--end expr
Expression to end at.
--filename
Prefix filename(s) to matched lines.
-i
Ignore case.
--includeEnd
Include the matching end line itself.
--invert-match (or -v)
Print lines *outside* of matched ranges.
--includeStart
Include the matching start line itself.
--label l
Display l as the 'filename' for hits on STDIN.
--line-number (or -n)
Prefix displayed lines with line numbers.
--maxLinesPerRange n
Don't find ranges where the start and end are separated by more than this many lines. Warning: It's not yet defined what happens if there are additional starts found, within a pending range -- probably it should find the shortest range, or the longest one under the max length, but I haven't addressed that case yet.
--maxRanges n
Stop after n matching ranges have been found.
--only-matching or -o
Include only the inward part of the start/end lines.
-q or --quiet
Suppress most messages.
--sep s
Print this after each matched range. Default: none ("").
--start expr
Expression to start at.
-s
Suppress messages about missing files.
--verbose
Add more detailed messages.
--version
Display version info and exit.
You can do this with a bit of awk
if you remember how. See https://stackoverflow.com/questions/38972736. For example: awk '/PAT1/{flag=1} flag; /PAT2/{flag=0}' myFile
My grepData
and body --iterate
scripts do related extractions.
-h and -o and --color are not finished.
There are no grep
-like options for: context: -A -B -C; regex types: -E -e -F -G -w -x; unusual file options: -a -d -D -I -U; other: -f -l -L -r and -z.
2008-02-01: Written by Steven J. DeRose.
2008-02-11 sjd: Add -c, -i, --includeEnd, --includeStart, -n, -v(invert-match).
Start multi-file support, -h -H -o.
2015-08-20: Clean up.
2020-08-31: New layout.
2021-04-11: Fix file handling. Implement --only-matching. Show awk alternative.
2021-09-10: Rename --istart and --iend to --includeStart and --includeEnd.
Add --fpat and --lpat as synonyms for --start and --end (to match `body` script).
Fix several major logic bugs, clean up code. Add --sep.
2021-09-22: Clean up doc. Add --filename and unify prefix-handling.
Integrate into C<body> or C<grepData>.
-b (byte-offset), -u (unix-byte-offsets) (strips CR).
Port to Python and integrate PowerWalk.
grep-like options? Port and switch to PowerWalk for most. files-with[out], --null,
--only-matching, -x/--line-regexp, -w/--word-regexp
Finish --color (incl. responding to $GREP_COLOR)
Integrate with grepData, esp. for PowerWalk and context args.
Grab options from $GREP_OPTIONS or maybe $GREPRANGE_OPTIONS.
Generate return code: 0: something found; 1: none found; >1: error.
Copyright 2008-02-01 by Steven J. DeRose. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. For further information on this license, see https://creativecommons.org/licenses/by-sa/3.0.
For the most recent version, see http://www.derose.net/steve/utilities or https://github.com/sderose.