Skip to content

split_inclusive's DoubleEndedIterator implementation is unexpected #100756

@robertbastian

Description

@robertbastian
Contributor

I would expect next_back() to include the separator before the string, i.e. the separator that separates the item from the next one. However:

"a+b+c".split_inclusive('+').collect::<Vec<_>>()
// => vec!["a+", "b+", "c"]
"a+b+c".split_inclusive('+').rev().collect::<Vec<_>>()
// => vec!["c", "b+", "a+"], but I'd expect vec!["+c", "+b", "a"]

Playground

If this behaviour is intended, it should be documented.

Activity

ilyvion

ilyvion commented on Aug 19, 2022

@ilyvion

When you rev() an iterator, you expect to get the exact same elements as you do when you don't rev(), but in the opposite order. I.e. "a+", "b+", "c" becomes "c", "b+", "a+", the reverse of the former.

ilyvion

ilyvion commented on Aug 19, 2022

@ilyvion

You seem to be looking for the behavior I'd expect of an rsplit_inclusive(), which is not a thing. At least not yet.

robertbastian

robertbastian commented on Aug 19, 2022

@robertbastian
ContributorAuthor

The other split methods don't implement DoubleEndedIterator when the reverse search doesn't contain the same elements. From the docs:

The returned iterator will be a DoubleEndedIterator if the pattern allows a reverse search and forward/reverse search yields the same elements.

A reverse search here doesn't yield the same elements, so this should not implement DoubleEndedIterator.

timvermeulen

timvermeulen commented on Aug 20, 2022

@timvermeulen
Contributor

A reverse search here doesn't yield the same elements, so this should not implement DoubleEndedIterator.

It does in your initial example. Do you have an example where it doesn't hold?

SkiFire13

SkiFire13 commented on Aug 20, 2022

@SkiFire13
Contributor

Do you have an example where it doesn't hold?

For example "a+++b".split_inclusive("++") will yield ["a++", "+b"] when iterated forward, and ["a+++", "b"] when iterated backwards.

assert_eq!("a+++b".split_inclusive("++").collect::<Vec<_>>(), vec!["a++", "+b"]);
assert_eq!("a+++b".split_inclusive("++").rev().collect::<Vec<_>>(), vec!["+b", "a++"]); // Fails
// This is actually equal to `vec!["b", "a+++"]`

Mixing calls to next and next_back is even more broken:

let mut iter = "a+++b".split_inclusive("++");
assert_eq!(iter.next(), Some("a++"));
assert_eq!(iter.next_back(), Some("+b")); // Fails
// This is actually equal to `Some("b")`
assert_eq!(iter.next(), None) // Fails
// This is actually equal to `Some("+")`
timvermeulen

timvermeulen commented on Aug 20, 2022

@timvermeulen
Contributor

Ah, with string patterns, yes. split_inclusive with a string should not have implemented DoubleEndedIterator — it looks like SplitInclusive's DoubleEndedIterator implementation incorrectly requires that its searcher is a ReverseSearcher rather than a DoubleEndedSearcher.

split_inclusive does seem to behave just fine for char separators, though.

robertbastian

robertbastian commented on Aug 22, 2022

@robertbastian
ContributorAuthor

It does in your initial example. Do you have an example where it doesn't hold?

No it doesn't. The iterator returns the same elements, but this is not what the action of "return the first occurence from the back including its separator from the next element" should do.

For all other split variants, split('a').rev() is equivalent to rsplit('a'), but for split_inclusive this is NOT the case. I think this makes the DoubleEndedIterator a massive footgun, especially because there is no rsplit_inclusive so people will want to use .next_back() as a workaround. This behaviour should at least be documented, I guess it cannot be removed at this point.

timvermeulen

timvermeulen commented on Aug 22, 2022

@timvermeulen
Contributor

A reverse search here doesn't yield the same elements, so this should not implement DoubleEndedIterator.

It does in your initial example. Do you have an example where it doesn't hold?

No it doesn't.

But it does: repeated next calls yield "a+", "b+", "c", whereas repeated next_back calls yield "c", "b+", "a+". Those are the same items, in opposite order. split_inclusive(x).rev() is not meant to behave like rsplit_inclusive(x), it would violate the DoubleEndedIterator contract if it did.

For all other split variants, split('a').rev() is equivalent to rsplit('a'), but for split_inclusive this is NOT the case.

It's also not the case for e.g. split("aa"), because a forward search may find different matches than a backward search. But I agree that we should probably add rsplit_inclusive as well.

robertbastian

robertbastian commented on Aug 22, 2022

@robertbastian
ContributorAuthor

Yeah we're talking past each other. I'm talking about the concept of searching a string from the back, which can return different items than searching the string from the front. For split_inclusive this even happens when the separator is a char. Now for all other split functions DoubleEndedIterator implements this concept of searching a string from the back (if the pattern supports bidirectionality), but for split_inclusive it's just an iterator reversal. Yes that is strictly correct because that appears to be the contract of DoubleEndedIterator, but then split_inclusive is incomplete, it's missing rsplit_inclusive (why wasn't this implemented together?), and it should also have a documentation section on iterator behaviour.

timvermeulen

timvermeulen commented on Aug 22, 2022

@timvermeulen
Contributor

That makes sense — whenever split implements DoubleEndedIterator, it behaves exactly like rsplit. In that sense, split_inclusive is more like chunks which can be reversed but (in general) doesn't behave like rchunks.

robertbastian

robertbastian commented on Aug 22, 2022

@robertbastian
ContributorAuthor

So let's document? And implement rsplit_inclusive?

timvermeulen

timvermeulen commented on Aug 22, 2022

@timvermeulen
Contributor

It looks like rsplit_inclusive was part of #91546, which has been put on hold.

added a commit that references this issue on Oct 7, 2023
added a commit that references this issue on Oct 10, 2023
added
T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.
T-libsRelevant to the library team, which will review and decide on the PR/issue.
C-discussionCategory: Discussion or questions that doesn't represent real issues.
and removed
C-bugCategory: This is a bug.
on Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-discussionCategory: Discussion or questions that doesn't represent real issues.T-libsRelevant to the library team, which will review and decide on the PR/issue.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @ilyvion@timvermeulen@ChrisDenton@robertbastian@SkiFire13

        Issue actions

          `split_inclusive`'s `DoubleEndedIterator` implementation is unexpected · Issue #100756 · rust-lang/rust