Skip to content

Feature: Iterator over matching and non-matching parts of a haystack #1296

@LunarLambda

Description

@LunarLambda

An API that is the union between find_iter (matching parts) and split (non-matching parts):

// I don't care about the exact naming or structure. Please do not bikeshed these.

// and equiv. in `bytes`
pub enum Piece<'a> {
    Matching(&'a str /* or `Match`? */),
    NotMatching(&'a str),
}

impl<'a> Piece<'a> {
    pub fn as_str(&self) -> &'a str;
}

impl Regex {
    pub fn pieces<'h>(&self, haystack: &'h str) -> impl Iterator<Item = Piece<'h>>;
}

such that

let r = Regex::new("%.").unwrap();

let text = String::from("Hello, world: %s %d");

let mut p = r.pieces(&text);

assert_eq!(p.next(), Some(Piece::NotMatching("Hello, world: ")));
assert_eq!(p.next(), Some(Piece::Matching("%s")));
assert_eq!(p.next(), Some(Piece::NotMatching(" ")));
assert_eq!(p.next(), Some(Piece::Matching("%d")));
assert_eq!(p.next(), None);

// roundtrip property:
assert_eq!(text, r.pieces(&text).map(|p| p.as_str()).collect::<String>())

I think it may be possible to build this by making a wrapper around find_iter but the code would be quite clunky, it would be beneficial to implement this in Regex proper.

Use cases where both matching and non-matching pieces of text are needed are pretty common, such as format strings, handling control sequences/utf-8/alternations of text & non-text, and currently require awkward constructions of manually tracking slice sub-ranges, memchr loops, or writing custom parsers with libraries like nom.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions