-
Notifications
You must be signed in to change notification settings - Fork 483
Open
Description
An API that is the union between find_iter
(matching parts) and split
(non-matching parts):
// I don't care about the exact naming or structure. Please do not bikeshed these.
// and equiv. in `bytes`
pub enum Piece<'a> {
Matching(&'a str /* or `Match`? */),
NotMatching(&'a str),
}
impl<'a> Piece<'a> {
pub fn as_str(&self) -> &'a str;
}
impl Regex {
pub fn pieces<'h>(&self, haystack: &'h str) -> impl Iterator<Item = Piece<'h>>;
}
such that
let r = Regex::new("%.").unwrap();
let text = String::from("Hello, world: %s %d");
let mut p = r.pieces(&text);
assert_eq!(p.next(), Some(Piece::NotMatching("Hello, world: ")));
assert_eq!(p.next(), Some(Piece::Matching("%s")));
assert_eq!(p.next(), Some(Piece::NotMatching(" ")));
assert_eq!(p.next(), Some(Piece::Matching("%d")));
assert_eq!(p.next(), None);
// roundtrip property:
assert_eq!(text, r.pieces(&text).map(|p| p.as_str()).collect::<String>())
I think it may be possible to build this by making a wrapper around find_iter
but the code would be quite clunky, it would be beneficial to implement this in Regex proper.
Use cases where both matching and non-matching pieces of text are needed are pretty common, such as format strings, handling control sequences/utf-8/alternations of text & non-text, and currently require awkward constructions of manually tracking slice sub-ranges, memchr loops, or writing custom parsers with libraries like nom.
Metadata
Metadata
Assignees
Labels
No labels