user-defined iteration using range over func values #56413

rsc · 2022-10-25T14:02:34Z

rsc
Oct 25, 2022
Maintainer

There is no standard way to iterate over a sequence of values in Go. For lack of any convention, we have ended up with a wide variety of approaches. Each implementation has done what made the most sense in that context, but decisions made in isolation have resulted in confusion for users.

In the standard library alone, we have archive/tar.Reader.Next, bufio.Reader.ReadByte, bufio.Scanner.Scan, container/ring.Ring.Do, database/sql.Rows, expvar.Do, flag.Visit, go/token.FileSet.Iterate, path/filepath.Walk, go/token.FileSet.Iterate, runtime.Frames.Next, and sync.Map.Range, hardly any of which agree on the exact details of iteration. Even the functions that agree on the signature don’t always agree about the semantics. For example, most iteration functions that return (T, bool) follow the usual Go convention of having the bool indicate whether the T is valid. In contrast, the bool returned from runtime.Frames.Next indicates whether the next call will return something valid.

When you want to iterate over something, you first have to learn how the specific code you are calling handles iteration. This lack of uniformity hinders Go’s goal of making it easy to easy to move around in a large code base. People often mention as a strength that all Go code looks about the same. That’s simply not true for code with custom iteration.

We should converge on a standard way to handle iteration in Go, and one way to incentivize that is to support it directly in range syntax. Specifically, the idea is to allow range over function values of certain types. If any kind of code providing iteration implements such a function, then users can write the same kind of range loop they use for slices and maps and stop worrying about whether they are using a bespoke iteration API correctly.

This GitHub Discussion is about this idea of allowing range over function values. This is obviously related to the iterator discussion (#54245), but one aim of this discussion is to separate out just the idea of a language change for customized range behavior, which should probably be done independently of an iterator library. A library for iterators can then be built using and augmenting the range change, not being the cause of it.

To date, range's behavior has depended only on the type of its argument, not methods the argument has, nor any other details of the argument. Range currently handles slice, (pointer to) array, map, chan, and string arguments. We can extend range to support user-defined behavior by adding certain forms of func arguments.

There are two natural kinds of func arguments we might want to support in range: push functions and pull functions (definitions below). These kinds of funcs are duals of each other, and while push functions are more suited to range loops, both are useful in different contexts.

This posts suggests that for loops allow range over both push functions and pull functions. The end of the post also suggests range over int.

The rest of this post explains all this in more detail.

Push functions

A push function is a function with a type of one of these forms:

func(yield func(...) bool) 
func(yield func(...) bool) bool

That is, a push function takes a single argument, here named yield, although that exact name is not a requirement. The yield argument is itself a function taking N arguments (0 ≤ N ≤ 2) (denoted by ... in the pseudo-syntax above) and returning a single bool. The push function itself must return nothing at all or else a single bool. The optional bool allows the push function to indicate whether it stopped early, which can be useful when composing push functions; when called using range syntax, the compiled code would ignore the result.

The push function enumerates a sequence of values by calling yield repeatedly. The bool result from yield indicates whether to keep yielding operations (true means continue running, false means stop). Each call to yield runs the range loop body once and then returns. When there are no more values to pass to yield, or if yield returns false, the push function returns.

In short, a push function pushes a sequence of values into the yield function.

For example, here is a method to traverse a binary tree:

func (t *Tree[K, V]) All(f func(key K, val V) bool) bool {
	if t == nil {
		return true
	}
	return t.left.All(f) && f(t.key, t.value) && t.right.All(f)
}

The method value t.All is a push function: it has signature func(func(K, V) bool) bool.

With that method, one can write today:

t.All(func(k K, v V) bool {
	fmt.Println(k, v)
	return true
})

(In this usage, the caller doesn’t care about the boolean result from t.All, only the fact that it calls f on every key-value pair.)

Adding support for push functions to range would allow writing this equivalent code:

for k, v := range t.All {
	fmt.Println(k, v)
}

In fact, the Go compiler would effectively rewrite the second form into the first form, turning the loop body into a synthesized function to pass to t.All. However, that rewrite would also preserve the “on the page” semantics of code inside the loop like break, continue, defer, goto, and return, so that all those constructs would execute the same as in a range over a slice or map.

If you are worried about the subtle variable scoping difference, consider the change discussed in #56010 a prerequisite of adding func support to range.

Note that the results of the push function (if any) are discarded when using the range form. Most often a push function will return nothing at all, or else a bool indicating whether the loop stopped early, as the All method does to make recursion easier.

A method x.All(f), which may become a common pattern, has two different, equally valid interpretations. One is that f is a yield function and All passes all the tree's contents to f. The other is that f is a condition function and All reports whether the condition is true for all the contents of the tree, stopping the traversal once it determines the result.

Pull functions

A pull function is a function with a type of the form

func() (values, bool)

That is, a pull function takes no arguments and returns the next set of N values (0 ≤ N ≤ 2) from the sequence. Each valid set of values comes with a final true bool result. When there are no more values, the pull function returns arbitrary values and a false bool.

A pull function must maintain internal state, so that repeated calls return successive values.

In short, a pull function lets the caller pull successive elements from the sequence, one at a time.

For example, here is a method that returns a pull function to traverse a linked list:

func (l *List[V]) Iter() func() (V, bool) {
	cur := l
	return func() (v V, ok bool) {
		if cur == nil {
			return v, false
		}
		v, ok = cur.value, true
		cur = cur.next
		return
	}
}

The method value l.Iter is not a pull function, but it returns one.

With that method, one can write today:

next := l.Iter()
for v, ok := next(); ok; v, ok = next() {
	fmt.Println(v)
}

Adding support for pull functions to range would allow writing this equivalent code:

for v := range l.Iter() {
	fmt.Println(v)
}

In fact, the Go compiler would effectively rewrite the second form into the first form. Again, consider the scope change in #56010 a prerequisite.

If some iterator-like value had a Next method that returned (value, bool), we could write:

for v := range it.Next {
	...
}

Note that range over pull functions has been proposed by itself as #43557, and the discussion also considered push functions (for example, #43557 (comment)). Both can be appropriate at different times.

Duality of push and pull functions

Any push function can be converted into a pull function and vice versa.

Converting a pull function into a push function is a few lines of code:

func push(next func() (V, bool)) func(func(V)bool) {
	return func(yield func(V) bool) {
		for {
			v, ok := next()
			if !ok || !yield(v) {
				break
			}
		}
	}
}

Converting a push function into a pull function is more involved. Because the push function has its own state maintained in its stack (like in the binary tree traversal), that code must run in a separate goroutine in order to give it a stack that persists across calls to the next function. The full code is in this playground snippet.

It can be arranged that the separate goroutine executes with its own stack but not actually running in parallel with the caller. With a bit of smarts in the compiler and runtime, but no changes to the Go language or any of its semantics, that lack of parallelism allows the separate goroutine to be optimized into a coroutine, so that switches between the caller and the push function are fairly cheap. The details of the optimization are beyond the scope of this discussion but are posted in the “Appendix” of #54245.

The signature for converting a push function to a pull function is

func pull(push func(yield func(V) bool)) (next func() (V, bool), stop func())

The conversion must return two functions: the pull function next and a cleanup function stop, which shuts down the goroutine.

Although push and pull functions are duals, they have important differences. Push functions are easier to write and somewhat more powerful to invoke, because they can store state on the stack and can automatically clean up when the traversal is over. That cleanup is made explicit by the stop callback when converting to the pull form.

For example, the binary tree traversal above was made very easy by being able to use recursion in its implementation. A direct implementation of a pull form would need to maintain its own explicit stack instead, like:

func (t *Tree[K, V]) Iter() func() (K, V, bool) {
	var stk []*Tree[K, V]
	for ; t != nil; t = t.left {
		stk = append(stk, t)
	}
	next := func() (k K, v V, ok bool) {
		if len(stk) == 0 {
			return k, v, false
		}
		t := stk[len(stk)-1]
		stk = stk[:len(stk)-1]
		for r := t.right; r != nil; r = r.left {
			stk = append(stk, r)
		}
		return t.key, t.value, true
	}
	return next
}

That implementation is much harder to reason about and probably contains a bug.

As another example of the power of push functions and automatic cleanup, consider this function that allows ranging over the lines from a file:

func Lines(file string) func(func(string, error) bool) {
	return func(yield func(string, error) bool) {
		f, err := os.Open(file)
		if err != nil {
			yield("", err)
			return
		}
		defer f.Close()
		b := bufio.NewReader(f)
		for {
			line, err := b.ReadString('\n')
			if err != nil {
				if err != io.EOF {
					yield("", err)
				}
				break
			}
			if !yield(line, nil) {
				break
			}
		}
	}
}

This could be used as:

for line, err := range Lines("motd.txt") {
	if err != nil {
		log.Fatal(err)
	}
	fmt.Print(strings.ToUpper(line))
}

Note that the implementation of Lines can use defer to clean up automatically when the loop is done. An implementation using a pull function would need a separate stop function to close the file.

A push function usually represents an entire sequence of values, so that it can be called multiple times to traverse the sequence multiple times. It can usually also be called simultaneously from different goroutines if they both want to traverse the sequence, without any synchronization. In contrast, a pull function always represents a specific point in one traversal of the sequence. It can be advanced to the end of the sequence, but then it can't be reused. Goroutines cannot share a pull function without synchronization, but a pull function can be used from multiple call sites in a single goroutine, such as a lexer pulling bytes from an input source.

In terms of concepts in other languages, a push function can be thought of as representing an entire collection. The implementation of the push function maintains iterator state implicitly on its stack, so that multiple uses of the push function use separate instances of the iterator state. In contrast, a pull function can be thought of as representing an iterator, not an entire collection.

Push and pull functions represent different ways of interacting with data, and one way may be more appropriate than the other depending on the data. For example, many programs process the lines in a file in a single loop, so a push function is appropriate for lines in a file. In contrast, it is difficult to imagine any programs that would process the bytes in a file with a single loop (except maybe wc), while many process bytes in a file incrementally from many call sites (again, lexers are an example), so a pull function is more appropriate for bytes in a file.

Because both forms are appropriate in different contexts, range loops should support functions of both types. Note that there is no overlap between the two function kinds: push functions always have one argument, while pull functions always have no arguments.

Alternatives

An alternative would be to extend range by recognizing special methods. For example if range knew to call a .Range method, then we could define (*Tree).Range and then use

for k, v := range t {
	...
}

instead of

for k, v := range t.All {
	...
}

One aesthetic reason not to do this is that range today uses types to make the decision, and it seems cleaner to continue to do that. In fact, there is nothing in the language today that calls specially defined methods. (The closest to that is the definition of the error interface, but no language construct calls the Error method.) Aesthetic reasons aside, though, there are two practical problems with a method-based decision.

The first problem with a method-based decision is that only a single method can implement the behavior. Using functions, other methods can be called instead simply by naming them. For example we might define t.AllReverse that enumerates the tree in reverse order, and then a loop can use

for k, v := range t.AllReverse {
	...
}

Similarly, an iterator that defines Next might also define Prev, allowing

for v := range it.Prev {
	...
}

The second problem with a method-based decision is that it can conflict with the type-based decision. For example if the loop calls the Range method, what happens in a range over a channel value that also has a Range method? Is it treated like other channels, ignoring the Range method? It would seem that must be the case, for backwards compatibility. But then it's confusing that the Range method doesn't win.

Continuing the type-based decision instead of introducing a new method-based decision rule avoids these problems.

Range over ints

One common problem for developers not coming from the C family of languages is puzzling through the Go idiom

for i := 0; i < n; i++ { ... }

When you stop to explain it, that’s a lot of machinery to say “count to n”.

One common use case that people have mentioned for user-defined range behaviors is to have a standard function to simplify that pattern, like:

func count(n int) func(yield func(i int) bool) {
	return func(yield func(i int) bool) {
		i := 0
		for i < n && yield(i) {
			i++
		}
	}
}

used as:

for i := range count(n) { ... }

If this will become the new idiom for counting to n, it's unclear where the count function would be defined. Some package that essentially every program imports?

Counting from 0 to n is so incredibly common that it could merit a predefined function, but at that point we’re talking about a language change. And if we’re talking about a language change, it makes sense to continue to extend range in a type-based way, namely by ranging over ints.

Adding support for ints to range would allow writing this code:

for i := range n { ... }

instead of:

for i := 0; i < n; i++ { ... }

For former C, C++, and Java programmers, the idea of not writing the 3-clause for loop may seem foreign. It did to me at first too. But if we adopt this change, the range form would quickly become idiomatic, and the 3-clause loop would seem as archaic as ending statements with semicolons.

Discussion

What do people think about this idea?

Should we stop at push functions and not allow pull functions in range?

Should we add range over int too?

nachtjasmin · 2022-10-25T14:48:45Z

nachtjasmin
Oct 25, 2022

At first, I'd like to note that a general idea of something iterator-related is a welcome addition to the language.

When I started with Go (coming from a C# background), the differences between the ways of iteration were quite confusing. And they still are! C# has this notion of: everything that is an IEnumerable<T> can be accessed and manipulated with LINQ.

However, LINQ is a beast itself and introducing something like that is definitely not suitable for the goals of the Go programming language. And I would even argue that it's not needed.

The concept of pull and push functions is clear. Incorporating this even further into the language, e.g. by defining a Range() method as considered in the alternatives would decrease the readability. Developers would need to know about this concept, because it definitely hides something. So I'd would consider this a no-go. Explicit readable code is the preferred way.

As for the range over ints proposal: Python has something similar, therefore this new pattern could improve the adoptability across Python developers. I, for one, don't have a strong opinion for or against it, as I'm already too used to the C-way.

Edit: fixed some minor grammar corrections, simply because English is not my first language.

1 reply

nemith Oct 26, 2022

Incorporating this even further into the language, e.g. by defining a Range() method as considered in the alternatives would decrease the readability. Developers would need to know about this concept, because it definitely hides something.

I actually don't entirely agree with that. Yes it is a new concept, but it keeps the idea that a function is a function and has computational weight.

When doing a for x := range x.Range() you know should be able to know that it's a function call and that the range will produce values from it. This seems to fit more with the Go theme than having predefined interfaces that an object must implement to get range functionality. It really doesn't seem to be hiding anything, cards are out in the open. In other words you can read this as range (produce values) from x.Range() calling it every iteration. The only implicit part is the function signature which shouldn't be foreign to anyone who is familiar with a HandlerFunc or other similar APIs in go.

So this actually seems pretty explicit and actually far more readable as there would be one way of iterating through objects vs consulting the documentation on the specific iterator semantics.

evanphx · 2022-10-25T16:55:52Z

evanphx
Oct 25, 2022
Collaborator

I'm really interested in what the transform for push functions will be to allow flow control statements would be. This would effectively add a form of non-local return to Go, which other languages use to make these sorts of internal iterators feel nice.

6 replies

rsc Oct 25, 2022
Maintainer Author

Yes, I believe the yield function will panic if it is called after the loop is done or from the wrong goroutine.

prattmic Oct 25, 2022
Maintainer

or from the wrong goroutine.

This is an unfortunate limitation, do we need it? I don't think we want to allow racy calls to yield, but I can imagine push functions that e.g., start worker goroutines to walk a data structure and call yield on all elements. Provided there is synchronization around yield calls, it feels like that should be fine.

This could always be worked around by having workers send values back to the original goroutine, which calls yield, but this feels like an awkward requirement in the language. I can't think of other functions that must be called from the correct goroutine (t.FailNow() is the closest I can think of), so this seems odd.

That said, I'm not sure how to reconcile this with what should happen if the loop body panics.

~~yield just returns false, I suppose? (This would imply that the yield implementation would use some internal communication mechanism to make the loop body run in the original goroutine)~~ Edit: this doesn't make sense, as the original goroutine is likely blocked in something like sync.WaitGroup.Wait.

rsc Oct 26, 2022
Maintainer Author

Indeed, a panic or a call to defer is the main reason the goroutine limitation exists. I doubt it will be much of a problem in practice. We can also always lift it later.

evanphx Oct 26, 2022
Collaborator

Ah great thanks for all that info, that makes sense!

pat42smith Oct 29, 2022

@rsc

Yes, I believe the yield function will panic if it is called after the loop is done or from the wrong goroutine.

What do we expect the push function to do if the yield function panics, either for the reason above, or because of a call to panic within the loop body? I imagine we expect it to stop and not call the yield function any more; is that right?

If the push function uses defer to recover from the panic and call the yield function, it seems there is potential for an infinite loop of panics. Perhaps the yield function should check for this and after some number of panics do something else? Maybe exit the goroutine or exit the process?

cespare · 2022-10-25T17:55:19Z

cespare
Oct 25, 2022
Collaborator

A few assorted thoughts.

The conventional names I've heard before are internal iterators ("push functions") and external iterators ("pull functions"). I'm not sure if you were avoiding these terms on purpose, but this may be helpful when comparing to what other languages are doing.
When using a few generic containers (sets, ordered maps, etc) the lack of native range iteration is, I think, the biggest point of friction that makes them feel like second-class collection types. So I'm cautiously optimistic that this idea would solve that problem.
If the iterator function doesn't get inlined (how likely is that, particularly for push functions?), then the call-per-iteration seems like it would make this fairly slow in some contexts (performance-optimized data structures).
The iteration on ints seems wholly unnecessary to me. It makes two ways to do a very common task (decreasing code readability) while saving hardly any typing. Concern (3) is also significant here: if the "nice" way to do it is 10x slower, then the choice of which form to use is more of a burden.

3 replies

rsc Oct 25, 2022
Maintainer Author

A few assorted replies.

I wasn't aware of those terms. At first glance I'm not sure what would make something internal or external, so I think I will stick with pull or push, but thanks for the mapping for people who are already familiar with them.
👍
I believe that the most trivial push iterators will get inlined. Clearly we can't land this feature with terrible performance: we will do the work needed to make it perform well (or else rethink).
I'm not convinced that for i := range n decreases code readability, but that will depend on how quickly everyone moves to the new syntax. Regardless, it won't be 10X slower. It will be exactly the same speed.

Thanks for the comments!

srikarplus Oct 27, 2022

I actually like for i := range n syntax. I've faced error numerous times in Go writing that line. Coming from Python world where we are habituated to writing for i in range(n), this should be a welcome change.

jbduncan Feb 13, 2023

@rsc For your reference, I've found this article by Robert Nystrom (of craftinginterpreters.com fame) on internal and external iteration useful: https://journal.stuffwithstuff.com/2013/01/13/iteration-inside-and-out/.

tl;dr: External iteration is when the user's code "calls" the iteration, as with iterator types and for loops. Internal iteration is when the iteration "calls" the user's code, as with forEach-like functions in JavaScript, Ruby et al.

seebs · 2022-10-25T17:59:46Z

seebs
Oct 25, 2022

The push function case has a weird quality that I think is novel. The yield() function the compiler passes to it is a function that you can't write in Go, because it's a function which, when called, can execute a defer in a caller's context. I'm mildly afraid of that, not least because I have often wanted the ability to write "defer-but-in-parent" and also I would be absolutely miserable if anyone else (including "me three months ago") had access to it.

I don't think we could entirely dispose of the three-clause loop, but I do agree that I'd be fine with not needing it in the "count to n" case.

Anyway, as a person who's repeatedly wanted to request iterator support in the language, I will say that I like this a lot, and at least so far, this feels like something that I'd use and not hate, which is pretty high praise for programming languages.

1 reply

bcmills Oct 25, 2022

FWIW, I think of the yield magic as being more about optimizing away a goroutine than about running things in the caller's stack.

(https://go.dev/play/p/bB1-_JempqG)

ConradIrwin · 2022-10-25T19:23:42Z

ConradIrwin
Oct 25, 2022

Thanks for writing this up @rsc, and for providing a clear mental model around push and pull based iteration! I really like the direction this is going.

One thing that seemed a bit nuanced in the description was the push pull distinction apparently requiring different parenthesis.

for k, v := range t.All
for k, v := range t.Iter()

This I likely an artifact of your example having Iter() return a function instead of an intermediate value (as in the discussion), because I think that this clears up the nuance (despite being more verbose):

for k, v := range t.All
for k, v := range t.Iter().Next

I also think the decision to pass a function to range instead of passing a value that has a given method is a good one; although it adds a bit of syntactic noise, it makes it very clear how the feature works, and (although you don't call this out) allows people who are navigating a new code-base to click on a method to see where it's implemented as they would for a function call.

The biggest concern I can think of is not really a concern with the proposed changes to range itself, but with how it would interact with the rest of the language. In particular, if so many things are allowed, there's no way to specify that my function takes "something it can pass to range". This may not be a problem in practice (the examples I can think of are fairly mundane) but it might be irritating to have to write ~6 implementations of the same thing for various different push and pull function signatures.

This could be solved with some syntax in interface definitions (for example):

type Aller interface {
  All range[K, V]
  AllKeys range[K]
}

This is similar to the operator based approach that was decided against for type parameters (in favor of the named types approach) so there may be issues there that led to that decision that I don't know about. It also has the downside that you lose information (once you have an Aller you cannot call it's All() method directly because you don't know what its signature actually is).

Alternatively it could be solved by heavily restricting the proposal so that range only accepts functions with one signature (probably (func (k K, v V) bool) bool). Although it seems reasonable to require All to always return a bool (just as Close() always returns an error), I'm not sure how reasonable it is to require two callback parameters - implementors could always pass nil as a second value, but seems a bit meh. This would also mean that pull-iterators are not directly supported, and possibly the push function is made available so people can convert between the two. (I do think it would be reasonable to support i := range n if n is an integer type even if there was no way to pass "either an integer or a function with the right syntax).

type Aller interface {
  All func(func(k K, v V) bool)
  AllKeys func(func(k K, v V) bool)
}

A third option is to split the difference, allow some number of types (more than one and fewer than six) so that if you want to write code that takes something that can be passed to range, you only need a couple of different copies.

In any case it would be nice to be able to write code like this and pass anything that could be passed to range to a function (but probably not a deal breaker if you can't):

func benchmarkAll(t Aller) {
  time := t.Now()
  for k, v := range t.All {
    fmt.Println(t.Now().Sub(time), k, v)
    time = t.Now()
  }
}

4 replies

magical Oct 25, 2022

Thanks for writing this up @rsc, and for providing a clear mental model around push and pull based iteration! I really like the direction this is going.

Totally agree.

One thing that seemed a bit nuanced in the description was the push pull distinction apparently requiring different parenthesis.
for k, v := range t.All
for k, v := range t.Iter()

I noticed this too and i'm a little conflicted about it.

On the one hand, i don't want to have to remember whether any given loop needs the parentheses or not. Sounds like a great source of frustration while coding.
On the other hand, it could be nice to have a visual indicator of whether a loop is over a push-type (repeatable) or pull-type (consumable) value. (Attempting to reuse a spent iterator is a mistake that i still occasionally make in Python.)

Parenthesis would probably not be my first choice for that visual indicator. A different keyword, maybe. Or perhaps a naming convention would be sufficient.

That said, I guess this confusion already exists today - channels are consumable, but maps and slices are reusable - so maybe it's too late to do anything about it.

firelizzard18 Oct 26, 2022
Collaborator

range t.Iter() would require Next() to be a magic method name so I think that was an oversight. If you assume magic methods are forbidden, ranging over an iterable must be range t.Iter().Next.

robaho Oct 26, 2022

range t.Iter() would require Next() to be a magic method name so I think that was an oversight. If you assume magic methods are forbidden, ranging over an iterable must be range t.Iter().Next.

I believe t.Iter() returns a func - the name is immaterial - only the signature needs to match.

Merovius Nov 5, 2022

t.All could return a func as well. I think whether or not Iter() returning a func is more plausible depends on a) whether it was written before this proposal is implemented (it would most likely return a mundane iterator type), and b) whether we have an iterator library (it will likely return a canonical iterator type). In the space between implementing this proposal and us getting an iterator library, I could see an argument that returning a pull func from Iter() is easier in some cases.

Either way, whether you have to put a call expression into range doesn't actually depend on whether or not you use push or pull. It depends on what the type of range expression is. That's, FTR, the same as today - a range expression can be a function call, or it can not be.

Merovius · 2022-10-25T19:28:09Z

Merovius
Oct 25, 2022

What is the intent for using these as iterators? I know that the discussion here splits, but to me, if these are not usable to write an iterator library, I don't really see the point for a relatively invasive language change.

From what I can tell, there is no realistic way to write a function which takes "either a push or a pull function". There isn't even a way to write one which can take a push function, due to that having 6 (?) different forms. I mean, you can write a type-constraint for "it has to be any of these" and use a type-switch, but that isn't exactly ergonomic.

So all I could think of is iterator-compositions taking a form they need and returning a form they find convenient. With the user being expected to use the appropiate glue code to transform them back-and-forth. Especially given that some of these need a separate stop function, that sounds like a pain to manage.

So while I can totally see how this would enable us to iterate over user-defined collections (and I think it does that reasonably well, though I find the dangers of push persisting yield icky), I can't really see how this addresses the goal of "a standard way to do iteration".

AIUI one of the goals is to provide the language change needed to then do #54245. But #54245 really only needs pull-functions to be rangeable, doesn't it?

39 replies

willfaught Nov 5, 2022

Of course, you'd also need FromPushFunc2 and perhaps FromPushFunc0.

@Merovius Right, this was my point. You can't abstract over varying numbers of values in func signatures.

There are also the variants where the push iterator func itself returns a bool. That doubles the cases.

Subjective does not mean insufficient.

I don't think I equated the two with the totality of what I wrote.

With regard to your less subjective argument, note that this proposal does not require any significant rewriting of the loop body.

I was thinking that for push iterators, the loop body would just be the body of the callback, but now I see that return and goto wouldn't work that way. So the range loop is basically only working with pull iterators anyway, after converting push to pull. Makes sense.

Currently I can't imagine that we would make that choice. We would need much better arguments than we've seen so far.

I haven't seen any counterarguments from "we" indicating any problems with the arguments so far, but if there's just no interest from the core Go team, then there's no point in continuing to talk about it. I'll end my remarks about it here.

Merovius Nov 5, 2022

I haven't seen any counterarguments from "we" indicating any problems with the arguments so far

You have seen them. You disagree with them. That's fine, but I'd personally be far more inclined to converse, if that's a distinction you could consistently internalize and reflect.

AndrewHarrisSPU Nov 5, 2022

I think a, b, and c would all be assigned 0 that way.

In addition to a solution with channels, here's another solution with a higher-ordered CountN:

func CountN(n int) (push func(func(int) bool) bool) {
	i, exhausted := 0, false

	push = func(evalBody func(int) bool) bool {
		if exhausted {
			return false
		}

		pushMore := evalBody(i)

		i++
		if i >= n {
			exhausted = true
		}

		return pushMore
	}

	return push
}

Because there are no defer, goto labels, etc., this is plausible to run in Go:

iter := CountN(3)
var xs []int
for iter(func(x int) bool {
	xs = append(xs, x)
	return true
}) { // gofmt formats this oddly, who writes an empty loop body here?
}

It's kind of funny that what should be the loop body appears in a function here, with an empty loop body. I think that's a good demonstration of why the minimally magical language change would be sensible: just allow writing the loop body where it should reasonably appear, while preserving the semantics of defer, goto, break, continue, panic (or anything I'm forgetting)

willfaught Nov 6, 2022

You have seen them. You disagree with them.

@Merovius No, I haven't, and please don't tell me what I've seen.

To be clear, this is regarding the discussion about including generator functions.

Many of the points I made in response to you and @ianlancetaylor remain unaddressed. Here are some of them:

I don't think panic is an exception; it's the lighthouse pointing the way...
It clutters up the code. Nobody wants to wrangle callbacks if they can help it.
I don't think any other language feature has needed to know how to interact with user types like this, to be fair.
I think the compiler would emit an error if yield is used inside a function that doesn't return the right type.

Many of the points made in response to me were subjective and vague. Here are some of them:

My feeling is that part of this discussion is exactly about...
Defining Iter and yield in ways that make sense for Go...
We should not rely on a special type Iter, because nothing in the language depends on special types
We should not rely on a special function yield that changes the flow of control, because we try to make flow of control very clear (here panic is the exception)...
It does not seem Go-like to me...
It doesn't really seem better in the sense of making it easier or more convenient or more robust to write code.

I'm not going to explain here how logic works, but suffice it to say that good premises and conclusions in arguments are falsifiable, and "it does not seem Go-like to me" is not falsifiable. There's no way to argue against it. The only things you can really say in response to statements like that is "I agree that you say you have that feeling" or "I have the opposite feeling." There's no "meat" on those bones to sink your teeth into.

I addressed all of the objective points that were made in response to me. We seemed to get sidetracked on how generators work, and I'm still unclear on whether my attempt to clarify how they work had any kind of effect, since what I wrote wasn't acknowledged.

I'd prefer to leave it there. If we must, let's agree to disagree.

Edit: I should add that I interpreted the "we" to mean the core Go team, not including @Merovius, and by arguments, I meant unaddressed arguments.

ianlancetaylor Nov 6, 2022
Collaborator

@willfaught I'm sorry it seems like we're ignoring your points. I personally don't find it productive to reply with a simple "I disagree". It doesn't seem to lead to useful conversations.

That said:

I made the point that a builtin yield function would lead to unexpected flow of control. For some reason you call that point "subjective and vague." I don't think it is. I think it is objective and clear.

You responded by saying, I think, that we should treat panic not as an exception but as a path to follow.

I disagree.

Given that disagreement, I don't find it necessary to keep responding to every other argument on this topic. At some point we have to be able to draw a line.

I don't know if we are going to adopt this proposal (for range over pull and push functions) or not. I'm in favor of it but I can live without it. But even if we don't adopt this proposal, I'm really pretty sure that we aren't going to adopt a new yield builtin function. So my interest in discussing that topic is naturally somewhat limited. I'm sorry if this seems harsh. I'm sure it does seem subjective and vague. That's OK with me: some aspects of language design are subjective and vague. I'm just trying to state my views clearly and honestly.

firelizzard18 · 2022-10-25T19:59:54Z

firelizzard18
Oct 25, 2022
Collaborator

I like all of this proposal but it's not clear to me how control flow statements would be implemented within the body of a loop ranging over a push iterator.

Adding support for push functions to range would allow writing this equivalent code:
for k, v := range t.All {
	fmt.Println(k, v)
}
In fact, the Go compiler would effectively rewrite the second form into the first form, turning the loop body into a synthesized function to pass to t.All. However, that rewrite would also preserve the “on the page” semantics of code inside the loop like break, continue, defer, goto, and return, so that all those constructs would execute the same as in a range over a slice or map.

Break and continue are easy (return false and return true, respectively) but AFAIK the only way to implement return, defer, and goto without significant runtime changes (e.g. non-local return/jump) is with something like this:

// return value
var rval T
var doReturn bool
t.All(func(k K, v V) bool {
  rval, doReturn = value, true
  return false
})
if doReturn { return rval }

// goto label
var gotoLabel bool
t.All(func(k K, v V) bool {
  gotoLabel = true
  return false
})
if gotoLabel { goto label }

// defer fn()
var deferred []func()
defer func() {
  for n := len(deferred)-1; n >= 0; n-- {
    deferred[n]()
  }
}()
t.All(func(k K, v V) bool {
  deferred = append(deferred, fn())
})

Though I am not confident that my defer fn() translation would behave exactly the same as a non-local defer.

1 reply

rsc Oct 25, 2022
Maintainer Author

Sure, something like that. #47707 has a bunch of discussion about that. I'm trying to keep this discussion at a higher-level, but I'm confident it can be implemented.

seancfoley · 2022-10-25T22:15:08Z

seancfoley
Oct 25, 2022

My 5 cents:

I like the fact this avoids the introductions of specially-named methods. It's good that Go avoids this.
The "pull" range idiom is similar to what I envision as a typical iterator in most languages, the only difference being the iterator here is a function, each call advancing the iterator, rather than an implementation of some interface with a similar function that advances the iterator. In the end, they are similar. Does this addition improve the language? I'm not so sure, mostly because there's not a whole lot of difference between the code being replaced and the replacement, so I'm not so sure the small savings in code justifies the language addition.
The "push" idiom is a little more difficult to picture in one's mind, it takes a little bit more mental effort to traverse from the pull function to its use as part of a range loop. Once again, I'm not so sure the small savings in code justifies the language addition.
the "range over int", I find it less useful, because it is restricted to a range from 0 up to n, and I find my range loops are sometimes descending, sometimes starting from 1, and all sorts of other combinations. So I don't see it as all that valuable. It would be a bit more valuable if it looked like: for i := range m..n { } and then you'd have a few more options with m > n, m = 1, etc.

If people were to start writing

for i := range n { 
    i++
   ....
}

or

for i := range n { 
    i = n - i - 1
   ....
}

then that would be worse than what exists today.

So, overall, I'm skeptical of this proposal. For me, I'd probably be happier reading and writing the original code rather than these new 'range' equivalents.

For me, the main objective of adding iterators to the language is to provide common types shared by many. Code using or producing iterators written by different people would be automatically compliant, because they're both standardizing on the same common library types. Without that, additional code is being written to translate one iterator type to the other.

So, from this proposal I suppose the pull and push functions suggest that you might define standardized iterators to be:

type Iter[V any] func() (V, bool)
type PushIter[V any] func(V) bool

And then perhaps people might decide to standardize on these two iterator data types, but frankly, they are probably not what I would choose to standardize on (although that would be a whole new discussion).

6 replies

rsc Oct 26, 2022
Maintainer Author

I'm not going to try to convince you to change your mind, but I do want to point out that this reply is focusing on the language change by itself, not engaging with the point at the start of the post, namely that there is a tower of babel of iterators and that supporting canonical ones in range will both encourage implementers to use a standard pattern and make usages cleaner.

I agree that in these trimmed down examples the differences does not appear large, although with more complex expressions the linguistic benefit is greater. But focusing on the linguistic benefit ignores the ecosystem benefit of a way to standardize what an iterator interface looks like.

seancfoley Oct 26, 2022

I agree with this comment, I do think that standardization around a common pattern is beneficial, to avoid the tower of babel. In fact, that's what I meant by my comment "For me, the main objective of adding iterators to the language is to provide common types shared by many".

I agree 100% that the primary benefit is to "standardize what an iterator interface looks like".

I did focus largely on the integration with range in my comment. So, I see your point in this reply. I think that you are right that a proposal like this would likely push most of the past and future iterators towards either the push or pull pattern proposed here. If that is the primary goal, it would likely do that, in my opinion. People would most likely want to support any new "range" functionality. Although, to be sure, you'd probably want to make the "range" functionality as attractive as possible.

Probably, most would gravitate towards the "pull" pattern.

firelizzard18 Oct 26, 2022
Collaborator

Probably, most would gravitate towards the "pull" pattern.

You expect most developers would choose to implement Next() (T, bool) instead of Range(yield func(T) bool)? IMO the latter is far more intuitive, especially for complex structures such as a tree. Next() (T, bool) is easy to implement for queue-like values such as channels and random access values such as slices, but implementing Next() (T, bool) for a map or a tree is significantly more complex. Implementing Range(yield func(T) bool) is trivial for most iterable values.

seancfoley Oct 26, 2022

I think the code search below by @rsc provides some evidence that most people in most situations gravitate towards pull functions.

Even though callbacks are sometimes the better choice to make code cleaner, many developers never use them at all. Pull is simpler, you call something and get something back, and then you repeat, case closed. But it's true that it can require more work maintaining state inside the pull function. Maybe for push, some people have a harder time picturing a call stack and the flow of control in their minds, and maybe pull is easy enough in most cases that it is the preferred choice.

I do agree, traversing binary trees, or many other data structures, is often much cleaner and simpler with callbacks like the push pattern, and so push can sometimes be the better choice.

firelizzard18 Oct 26, 2022
Collaborator

I expect the main reason for package developers preferring pull iterators is that they feel more natural for the consumer than passing a callback, IMO. However if this proposal is accepted, I expect package developers will shift to writing push iterators with the barrier (the consumer reasoning about a callback) gone.

pat42smith · 2022-10-25T23:41:48Z

pat42smith
Oct 25, 2022

I like this. One tiny nit, though... I would prefer to leave out the possibility of a push function returning a bool.

It doesn't save much code, as any function returning a bool can be trivially wrapped in a function that returns nothing. The tree example above could be rewritten:

func (t *Tree[K, V]) Under(f func(key K, val V) bool) bool {
	if t == nil {
		return true
	}
	return t.left.Under(f) && f(t.key, t.value) && t.right.Under(f)
}

func (t *Tree[K, v]) All(f func(key K, val V) bool) {
        t.Under(f)
}

I would prefer either to say that a push function must return nothing, or to say that a push function can have any return type(s) at all, including nothing, and for...range will ignore the returned values.

8 replies

pat42smith Oct 26, 2022

If we drop the optional bool, we need a good, short name to replace All as the thing you range over in for t := range x.All {. If the answer is for t := range x.Under {, I don't understand what Under means in that context.

In the code snippet above, my intent was that one would continue to write
for t := range x.All {. Under is just a support method for All, and
is identical to your All method, just renamed.

rsc Oct 26, 2022
Maintainer Author

The reason for allowing specifically bool is that it is the same result as in the yield callback. Perhaps it should be dropped though, so that the function must return no results.

If we allowed arbitrary return types, I suspect there would too many false positives or misuse, such as a function that returns error being used with range and then the user not noticing the error.

Sorry for misreading Under vs All. I would be reluctant to establish a convention of calling the non-bool-returning push method All, since that's not the signature that Python and Rust's all has. We'd probably have to pick some other name.

ConradIrwin Oct 26, 2022

Each would be consistent with ruby; though I prefer Range to make the correspondence with the builtin feature clear (in the case with no bool return).

I do like the idea of reducing the number of possibilities, as that will help with the goal of consistency.

firelizzard18 Oct 26, 2022
Collaborator

In the context of a method on an iterable type accepting a predicate, I would expect All, Each, and Every to behave the same. In most languages, one of those is the idiom for "return true iff the predicate returns true for all/every/each element of the collection". IMO it would be more natural for Range to be a function that enumerates a range (subset) from the collection. Using All, Each, or Every as the iterator is natural and intuitive to me.

pat42smith Oct 28, 2022

@rsc

The reason for allowing specifically bool is that it is the same result as in the yield callback. Perhaps it should be dropped though, so that the function must return no results.

To be clear, although I would prefer to drop the returning bool version, it's not terribly important to me one way or the other.

earthboundkid · 2022-10-25T23:41:54Z

earthboundkid
Oct 25, 2022

Would the 0 arity version of an iterator be allowed? What about in-line functions? for range func() bool{ return true }{}?

1 reply

ianlancetaylor Oct 25, 2022
Collaborator

I think the answer to both questions should be yes.

jimeh · 2022-10-25T23:54:24Z

jimeh
Oct 25, 2022

I really like the idea of getting some form of standardized enumeration/iteration in Go.

For my 2 cents, I'd like to start with a as concise and explicit TL;DR summary of push/pull functions as I've understood them:

You call push functions once in your code, and the given yield function is called repeatedly by the push function, once for each item in the "collection".
You call pull functions repeatedly in your code, each time it returns the next item in the "collection".

I generally like this, as it makes for a set of very small yet simple and flexible methods of enumeration/iteration over a collection.

Suggestion

Next, and feel free to disagree here, I'd like to suggest alternative names for push/pull functions:

"enumerator functions" instead of "push functions".
"iterator functions" instead of "pull functions".

Enumerator

My reasoning for "enumerator" is largely due to my history with Ruby, where any object can be made enumerable by simply defining a #each method that works very much like the push functions proposed here. (You should also include the Enumerable module to get #map, #inject, #select, etc., which all use #each under the hood.)

Personally at least, the word "push" feels suggestive of pushing values into the collection. Hence when reading the code examples, I realized push functions works very different from the initial impression I got based on the name.

Iterator

As for "iterator", my reasoning is simply that it feels very similar to other types of iterator objects I've come across which may have Next(), Prev() and similar methods. Except it's not an iterator itself, it is a singular "iterator function", that simply iterates to the next item each time it's called, and nothing else.

Type safety?

The only thing I feel slightly uneasy about with these functions, is that I don't see how the type system could be used to reliably ensure a function given to range is a push or pull function, and not simply something completely different that has a bool as it's final return value, or a func arg with a bool as a final return value.

Range int

And finally, regarding range over ints, conceptually the wording of something like range 7 feels a bit forced to me. 7 is itself not something with a range. Something like range 2..7 feels less forced, and is more flexible too. But I assume that requires changes to Go's syntax.

Though I personally feel fine about using the three-clause for loop on the rare occasion I need to loop N times. And that's despite my history with Ruby and its 7.times { |n| ... } and (2..7).each { |n| ... } stuff.

5 replies

earthboundkid Oct 26, 2022

TBH, I wonder if Russ added the integer thing as a duck.

rsc Oct 26, 2022
Maintainer Author

There are 19 instances in the standard library of methods that are pull functions:

% grep '() (.*, bool)' go/api/*.txt
go1.1.txt:pkg math/big, method (*Rat) Float64() (float64, bool)
go1.12.txt:pkg runtime/debug, func ReadBuildInfo() (*BuildInfo, bool)
go1.15.txt:pkg testing, method (*T) Deadline() (time.Time, bool)
go1.4.txt:pkg math/big, method (*Rat) Float32() (float32, bool)
go1.4.txt:pkg net/http, method (*Request) BasicAuth() (string, string, bool)
go1.5.txt:pkg net/smtp, method (*Client) TLSConnectionState() (tls.ConnectionState, bool)
go1.7.txt:pkg context, type Context interface, Deadline() (time.Time, bool)
go1.7.txt:pkg runtime, method (*Frames) Next() (Frame, bool)
go1.8.txt:pkg database/sql, method (*ColumnType) DecimalSize() (int64, int64, bool)
go1.8.txt:pkg database/sql, method (*ColumnType) Length() (int64, bool)
go1.8.txt:pkg database/sql, method (*ColumnType) Nullable() (bool, bool)
go1.txt:pkg fmt, type ScanState interface, Width() (int, bool)
go1.txt:pkg fmt, type State interface, Precision() (int, bool)
go1.txt:pkg fmt, type State interface, Width() (int, bool)
go1.txt:pkg net/url, method (*Userinfo) Password() (string, bool)
go1.txt:pkg reflect, method (Value) Recv() (Value, bool)
go1.txt:pkg reflect, method (Value) TryRecv() (Value, bool)
go1.txt:pkg regexp, method func (*Regexp) LiteralPrefix() (string, bool)
go1.txt:pkg regexp/syntax, method (*Prog) Prefix() (string, bool)
%

Of these, two are arguably actual pull functions (reflect.Value.Recv and reflect.Value.TryRecv). The other 17 are accidental. One of the oldest (from Go 1) is regexp.Regexp.LiteralPrefix, which has signature:

func (*Regexp) LiteralPrefix() (string, bool)

So you could accidentally write

for prefix := range re.LiteralPrefix {
    use(prefix)
}

The loop would run either zero times or forever. This will be true of most "accidental" pull methods: if it's accidental, it probably returns the same thing every time you call it. Even the lightest testing seems likely to find this problem.

There are 7 instances in the standard library of methods that are push functions:

% egrep '\(func\(.*\) bool\)( bool)?$' go/api/*.txt 
go/api/go1.16.txt:pkg go/build/constraint, method (*AndExpr) Eval(func(string) bool) bool
go/api/go1.16.txt:pkg go/build/constraint, method (*NotExpr) Eval(func(string) bool) bool
go/api/go1.16.txt:pkg go/build/constraint, method (*OrExpr) Eval(func(string) bool) bool
go/api/go1.16.txt:pkg go/build/constraint, method (*TagExpr) Eval(func(string) bool) bool
go/api/go1.16.txt:pkg go/build/constraint, type Expr interface, Eval(func(string) bool) bool
go/api/go1.9.txt:pkg sync, method (*Map) Range(func(interface{}, interface{}) bool)
go/api/go1.txt:pkg go/token, method (*FileSet) Iterate(func(*File) bool)
%

The first five are really all the same instance (Eval) and are accidental. Disallowing the returned bool would disqualify them. The last two are true push functions.

Accidental push functions seem to me far less common than accidental pull functions: plenty of methods take no arguments and return (T, bool). Very few take a callback returning a bool.

rsc Oct 26, 2022
Maintainer Author

range n is not a duck, although you are not the first person to ask that.

The 3-clause "count to n" really is a significant stumbling block for new Go programmers, and it is a remarkable number of tokens to explain, to do something so incredibly common. A quick scan looks like the majority of 3-clause for loops in the Go repo can use range instead:

% cd go
% git grep -En '^	+for .*;.*;.* {' | egrep '\.go:[0-9]+:' | egrep -v '\.pb\.go:' | egrep -v 'for *\(' >loops
% egrep 'for [a-zA-Z]+ :?= (0|[a-zA-Z0-9]*\(0\)); [a-zA-Z]+ <=? .*; [a-zA-Z]+\+\+ {' loops >ranges
% egrep -v 'for [a-zA-Z]+ :?= (0|[a-zA-Z0-9]*\(0\)); [a-zA-Z]+ <=? .*; [a-zA-Z]+\+\+ {' loops >for3
% wc -l ranges for3
    5236 ranges
    4342 for3
    9578 total
% hoc -e 100*5236/9578
54.66694508248069
%

The result holds up across projects:

tailscale   169/ 220 76.8%
hugo        302/ 471 64.1%
x/tools     434/ 757 57.3%
etcd        703/1072 65.6%
kubernetes 3572/5622 63.5%

Using range for the majority that do count from 0 to N would make the others stand out more as unusual in some way, which would be helpful when reading the code. Skimming the for3 files created by that script, I often noticed lines and thought "wait, what's wrong with my regexp? why is this here?" only to read more carefully and see that the line really isn't a 0 to N loop, in a way that I missed at first glance.

I do admit that range n seems very un-C-like, and that aspect surprises people. But I don't believe that means it is un-Go-like, any more than not using semicolons.

jimeh Oct 26, 2022

Good to see pull/push function signatures aren't that common.

I believe passing the wrong function to range would probably be pretty rare, but I do like the idea of push/pull functions by their nature explicitly being push/pull functions without the need to read accompanying documentation to make sure.

I'm not sure it's a good idea. But the only way I can think of to make their signature explicitly indicate they are pull/push functions, is to swap out the final bool return type, with a new custom "iterator bool" style type.

If we had a regular type in a iter package for example, something like:

type OK bool

const (
	Continue OK = true
	Halt     OK = false
)

Giving push functions the following signatures:

func(yield func(...) iter.OK) 
func(yield func(...) iter.OK) bool

And pull functions:

func() (..., iter.OK)

I'm not sure a iter package really fits with what's proposed here though, I merely used it as a means of easily showing conceptually what I have mind.

The end result though of something like the above, is that pull/push functions become distinct within the type system from other functions which have a final bool return value. And it also makes it very obvious to developers that it's a push/pull function by just looking at the function signature.

earthboundkid Oct 26, 2022

I'm fine with the integer range. Doesn't seem like a big deal either way. It has precedent in Vue templates, which have <element v-for="i in n"> or <element v-for="i of n">. My main issue with it in Vue is that in and of iteration behave differently in JavaScript, but the same in Vue templates, which is confusing. This doesn't apply to Go though.

earthboundkid · 2022-10-26T00:53:15Z

earthboundkid
Oct 26, 2022

If you have var f func() (T, error, bool) are for v := range f and (less importantly) for range f valid? Or do they need to be for v, _ := range f and for _, _ := range f? Making them valid is more consistent with slices and maps, but possibly more error prone since they allow accidentally omitting error checking.

5 replies

firelizzard18 Oct 26, 2022
Collaborator

since they allow accidentally omitting error checking.

That should be handled by go vet, just like defer reader.Close() or writer.Write(...) are.

rsc Oct 26, 2022
Maintainer Author

The number of range variables would be required to match the number of pull results (minus the bool) or the number of push yield arguments. Only slice and map would allow dropping the _.

willfaught Oct 26, 2022

Why special case slice and map in that regard?

Merovius Oct 26, 2022

I think there is a definite improvement in discoverability and understandability of this feature if there is always a 1-1 correspondence between range-variables and returns from the function. If I know little about Go and read Go code and see all those range statements and wonder what they do, it'll be hard enough as it is to map the dozen or so different forms of functions which can be used and what they mean. Throw into the mix that the number of range variables can also differ from the number of returns…

I would even go so far as to argue that allowing different number of loop variables for maps and slices might have been a mistake. for i := range someSlice is still a source of confusion and bugs, when people assume that yields values. If you'd always have to write for i, _ := range someSlice or for _, v := range someSlice, that would've been avoided, at the cost of a bit of verbosity.

The one exception I could see is for range x. Perhaps that should be allowed with any number of returns, as it looks sufficiently different.

DeedleFake Oct 26, 2022

Why special case slice and map in that regard?

range is already inconsistent since chans only have a single-value variant of it. I don't think it's too weird for funcs to have their own rule for that, too.

robaho · 2022-10-26T01:18:59Z

robaho
Oct 26, 2022

I don't understand why it needs to be as complicated as this. Can't Go define the standard iteration interfaces that range will support in the stdlib, and give an order of precedence and/or chose the interface based on the declared range variables. All you need is the standard pull-type interfaces.

I "think" this is complicated because there is still a design concern about supporting general iteration over a map - requiring a push interface. I think this is easily addressed with built-ins, e.g. iter(somemap) that return one of the above declared interfaces. For non-builtin containers this is not an issue.

I don't see how "generators" align with the Go team's concerns over flow control (which seems to have been the major barrier to exceptions). "generators" (aka hidden threads or coroutines) are the magic that Go typically tries to avoid.

9 replies

rsc Oct 26, 2022
Maintainer Author

I'm not sure what scoped channels are exactly, but that sounds like a much bigger language change than push-style iterators.

robaho Oct 26, 2022

I had envisioned scoped channels as simply tied to the creating routine. If that routine's references to the channel goes out of scope, the channel is automatically closed - essentially an implicit defer().

But I don't think it's really necessary. I don't think this is a real problem in practice. These leaked channels/routines are easily detected and then fixed in the design. In any long-lived system the monitoring will detect the leaks, and the overhead of the routine is fairly minimal in the meantime.

firelizzard18 Oct 26, 2022
Collaborator

As long as the range function is well behaved, the stack trace will be straightforward.

type Map[K comparable, V any] map[K]V

func (m Map[K, V]) All(yield func(K, V) bool) bool {
    for k, v := range m {
        if !yield(k, v) {
            return false
        }
    }
    return true
}

type Tree[K comparable, V any] struct {
    Key K
    Value V
    Left, Right *Tree[K, V]
}

func (t *Tree[K, V]) All(yield func(K, V) bool) bool {
    if t == nil {
        return true
    }
    return t.Left.All(yield) && yield(t.Key, t.Value) && t.Right.All(yield)
}

func run(it interface { All(func(K, V) bool) bool }) {
    // This
    for k, v := range it.All {
        printWithStack(k, v)
    }

    // Becomes this
    it.All(func(k K, v V) bool {
        printWithStack(k, v)
        return true
    })
}

In either case, the stack trace is simple. If the iterator is a Map, the stack is range_body -> (*Map[K, V].All) -> run. If the iterator is a Tree, the stack is range_body -> (*Tree[K, V].All) ... -> run with All repeated a number of times depending on the recursion depth. The only thing that's not obvious about that is that the range body is a function. And unless the iterator does something like run the iteration in a separate goroutine, the stack trace will be simple.

robaho Oct 26, 2022

Don't start in the range body - start at the function calling range - and imagine the debugger trying to step through the elements and the range body.

Based on the sample 'unranging' implementation it doesn't appear trivial to go backwards. A pull iterator seems trivial. Maybe each of the generated lines can be labelled with the source line but how the stack aligns is difficult for me to grasp - but I am sure I am missing something.

In most languages when you "compile for debug" it removes a lot of the optimizations because the debugger can't deal with it. This seems to require a level of code generation that won't be easy to work with.

firelizzard18 Oct 26, 2022
Collaborator

I would expect the debugger to step into All if I tell it "step into" when the current line is for range it.All. If I "step over" that line, I would expect it to transparently step through All until it calls yield or returns. If I "step into" the range statement during the middle of iterating, I would expect that to be equivalent to stepping out of the body function.

type It struct{}

func (It) All(yield func(int) bool) bool {
    if !yield(1) { return false }
    if !yield(2) { return false }
    if !yield(3) { return false }
    return true
}

func run() {
    // This
    for v := range it.All {
        // ...
    }

    // Becomes this
start:
    it.All(
        func(k K, v V) bool {
            // ...
        }
    )
}

start and returned-from-yield should map to for v := range it.All
When PC == start, "step into" should step to All and "step over" should step into the next call to yield
When PC == returned-from-yield, "step into" should effectively step out of yield into All and "step over" should step into the next call to yield
If PC is inside All, stepping over yield should behave normally

It's certainly quirky but it should work.

DeedleFake · 2022-10-26T02:04:07Z

DeedleFake
Oct 26, 2022

I like this idea. It solves a bunch of the confusion that arose trying to deal with an iterator interface, and it also leaves it open to potentially add new function signatures later if something useful comes up. It also feels more consistent with the way that the rest of Go works by not relying on methods for a language feature, though it's definitely a bit strange in its own right in a completely different way.

I thought a bit about how a general iteration package could be implemented around this, and I think the best option is to deal with push functions primarily. Everything else can be very easily and cheaply converted to them, so it seems like the most general, simplest form. Here's some examples, assuming that #49085 or something similar isn't adopted:

package iter

type Push[T any] func(yield func(T) bool) bool

type Pair[T1, T2 any] struct {
  A T1
  B T2
}

// Uses Pair to get the index, too.
func FromSlice[E any, S ~[]E](s S) Push[Pair[int, E]] {
  return func(yield func(Pair[int, E]) bool) bool {
    for i, v := range s {
      ok := yield(Pair[int, E]{i, v})
      if !ok {
        return false
      }
    }
    return true
  }
}

func FromMap[K comparable, V any, M ~map[K]V](m M) Push[Pair[K, V]] { ... }
func FromChan[E any, C ~chan E](c C) Push[E] { ... }

func Map[T, R any](f Push[T], m func(T) R) Push[R] {
  return func(yield func(R) bool) bool {
    return f(func(v T) bool { return yield(m(v)) }
  }
}

func Filter[T any](f Push[T], f func(T) bool) Push[T] { ... }
func Reduce[T, R any](f Push[T], initial R, f func(R, T) R) R { ... }

// These might be redundant because of Reduce(), though it would be nice to have common functionality like this pre-written.
func IntoSlice[E any, S ~[]E](s S, f Push[E]) S { ... }
func IntoMap[K comparable, V any, M ~map[K]V](m M, f Push[Pair[K, V]]) { ... }

And so on. The composition of the push functions is kind of interesting, but it's a bit confusing to look at, I think. It might be easier with something like #21498, but I'm not entirely sure.

6 replies

seancfoley Oct 26, 2022

"but it's a bit confusing to look at"

Yes, I think it is confusing to look at. It takes some concentration to follow. Returning from a function a function that takes a function argument, to be called by some calling function, is hard to follow without a lot of concentration.

I think the only way to make it easily readable is to do type definitions so we do not see all three functions at the same time.

pat42smith Oct 26, 2022

I thought a bit about how a general iteration package could be implemented around this, and I think the best option is to deal with push functions primarily.

We're off on a tangent here, but I disagree.

The primary purpose of a standard iterator type would be as glue: one thing produces values, another consumes values, and the iterator type connects the two things. As far as this goes, the iterator type could be based around either push or pull functions.

But an iterator type based on push functions can't do anything else, at least not in any reasonable way I can see.

An iterator type based on pull functions can have other abilities, if the underlying source of values permits those abilities. For example,

move an iterator some number of steps backwards so we can revisit the last few values
clone an iterator to yield another that can be advanced over the same sequence of values, independently of the original iterator
use iterators to mark positions in ordered collections, as in the C++ standard library

rsc Oct 26, 2022
Maintainer Author

An "iterator based on push functions" does not make sense. Push functions provide iteration, but they are not iterators in the sense of an iterator being an explicit object representing the state of an in-progress iteration (which is the meaning of "iterator" in almost every other language). I wrote above in a different comment:

Here's a different way to view things. The library being discussed in #54245 is about iterators (think "cursors"). Iteration, meaning what's possible with range loops, is a much broader topic. If you have a data structure that explicitly represents the paused state of a single iteration, which is what people usually mean by the term "iterator" (including in #54245, but also in C++ and Java), that's almost always a "pull function" (or an object with a pull method). Those have their place, and it would be fine to have a library to help with them, perhaps in std or perhaps in x. But range and looping generally is a much broader topic, and we shouldn't overfit to iterator objects.

DeedleFake Oct 26, 2022

For anyone coming along later, the comment in question: #56413 (reply in thread)

pat42smith Oct 26, 2022

I misread the comment I was reacting to as suggesting a form of iterator type. Sorry for the confusion.

"Push functions ... are not iterators". Yes. But given a push function, an iterator can be created from it. See NewGen in #54245.

atdiar · 2022-11-01T11:26:11Z

atdiar
Nov 1, 2022

Probably a stupid question, but in most cases where iteration has no side-effects (on itself?) , can't a properly defined, closure-based yield function allow to create a generator/iterator/pull function?

Essentially extracting the iteration internal to the collection by iterating once and storing values in a slice or pushing them onto a channel etc?

2 replies

atdiar Nov 1, 2022

Ha that's one stupid question I guess. If the collection changes, the iterator would need to see the changes as well I guess. Nvm.

AndrewHarrisSPU Nov 1, 2022

Your question is worth pondering - ISTM the really useful and interesting cases of for will be the ones where there's a lot going on behind the scenes. Persistent data structures would be a great example.

scott-cotton · 2022-11-02T08:02:15Z

scott-cotton
Nov 2, 2022

Just wanted to summarise my take on adding range F, F a push/pull func and range n, n an int, after digesting things further and some back and forth over a couple of sub-points in other threads.

First, thanks for setting up this discussion, it indeed addresses something missing in Go, a mechanism for custom iteration in an interesting way that extends the Go-like range specialisation over types to functions whose bodies roughly correspond to the body of a for loop, and over range n, n an int.

For range n, I think an approach which does not panic would be preferable. If it were the case, I'd be for it. If it were not, I'd lean toward not supporting it because it doesn't save much and the need to take into consideration possible panics would counterbalance the benefits.

I think custom iteration in general, that is by any means, should be taken slowly and with due diligence to, as the top doc says:

People often mention as a strength that all Go code looks about the same. That’s simply not true for code with custom iteration.

Custom iteration is often difficult to work with in other languages for this very reason. To me, one of Go's strengths is that there is not much custom iteration so that the loops all look the same. Support for a uniform mechanism for custom iteration would IMO make Go less uniform wherever it is over-applied or non convergent w.r.t. best practices. Finding a balance for where custom iteration helps vs hurts will be a long road, and guidance along the way would help.

For range F, the body of F corresponds roughly to the loop body via compiler translation of the loop body.
The 'roughly' part feels too rough to me: There are cases in which such a function would panic where it wouldn't in ordinary usage. The argument to push/pull is not really a full fledged func but it looks like it should be. The scoping differences as compared to range loops today are confusing, and are cited as a reason to take changing loop semantics as a prerequisite, even though the pre-declared for _ = range (not for _ := range) version has the original semantics. Should the loop semantics change, this would introduce non-uniformity in Go constructs where it is today uniform: := would no longer correspond to a single declaration within block boundaries {}. The for loop itself would behave more differently between the two versions, =, :=.

I don't think custom iteration really needs the rough edges above. Personally, I'd find something along the lines below much simpler.

[the code below was edited]

var (
  a A
  b B
)
for a, b = range pull { }
func pull() (A, B, bool) {...} // any ordinary user defined func

==>


var ( // declared if the above is for a, b := range, not if for a, b = range
  a A
  b B
  brk bool // compiler generated variable
)
for  {
  a, b, brk = pull()
  if brk { break }
  // user loop body here
}

Thanks for reading and your consideration!

[before edit, the incorrect code was]:

var (
  a A
  b B
)
for a, b = range pull, brk { }
func pull() (A, B) {...} // any ordinary user defined func
func brk(A,B) bool {...} // likewise

==>


var ( // declared if the above is for a, b := range, not if for a, b = range
  a A
  b B
)
for  {
  a, b = pull()
  if brk(a, b) { break }
  // user loop body here
}

even in interface form

type Ranger[A, B any] interface {
  Break(a A, b B) bool
  Pull() (A, B)
}

for a, b = range R {...} // R implements Ranger, equivalent to for a, b = range R.Pull, R.Break { ... }

0 replies

extemporalgenome · 2022-11-03T22:54:51Z

extemporalgenome
Nov 3, 2022

a pull function takes no arguments and returns the next set of N values (0 ≤ N ≤ 2) from the sequence

I am slightly concerned about ambiguity when an "element" bool is returned (i.e. ambiguity of use, as called out by @rsc as "accidental iterators"). While these may not show up all that often in the stdlib, but I wonder if these may show up more frequently in community code.

For example:

// CheckHealth samples a system, returning true if healthy
func CheckHealth() (healthy bool)

The above would appear to be a valid iterator despite not likely being designed with iteration in mind. If designed to be an iterator, it'd probably look more like:

func MonitorHealth() (healthy, more bool)

Side note: in practice, such iterators may always return true for the more value.

1 reply

bcmills Nov 4, 2022

Note that for ok := range CheckHealth() would be a compile-time error (too many values), while for range CheckHealth() would be equivalent to for CheckHealth(). The latter does suggest that 0-ary pull functions may be redundant, but it's not obvious to me that that's true of 0-ary push functions.

extemporalgenome · 2022-11-03T23:15:04Z

extemporalgenome
Nov 3, 2022

iirc, the rationale for not including custom iterators in the language initially was to prevent hiding cost and side effects (thus decreasing the ability to reason about code).

Today, if I see for range, I know that, except for iteration on channels, each iteration will be non-blocking, and will be exceptionally efficient (~constant cost for slices and maps, and bounded cost for strings). I know also that even with channels, the iteration will have no side effects. There are many classes of bug I might encounter where the cause simply can't be caused by the loop iteration, but with this proposal, that no longer would be the case.

I wonder if the loss of these reasoning guarantees is truly outweighed by the convenience gained through this proposal.

This concern would be mitigated if we had distinct syntax of some kind, such as for x := func iterator or for x := range @iterator. I'm not suggesting a particular syntax, but would like to see us consider the idea of a variant syntax.

This may also ease integration existing tooling, since I didn't see any behavior around omitted variables in the original proposal (i.e. for i := range iterator when iterator returns (int, string, bool)), and if we're forced to write for i, _ := range iterator, there will be some tooling, for some time, that will likely complain that the use of the blank identifier is unneeded.

12 replies

willfaught Nov 4, 2022

Today, if I see for range, I know that, except for iteration on channels, each iteration will be non-blocking, and will be exceptionally efficient (~constant cost for slices and maps, and bounded cost for strings).

I would instead look at it as: the iteration behavior for built-in types was simple and fast. That would remain true with iterators.

I know also that even with channels, the iteration will have no side effects.

The sending goroutine can mutate shared state or perform other side effects between sends.

extemporalgenome Nov 8, 2022

@willfaught

I would instead look at it as: the iteration behavior for built-in types was simple and fast. That would remain true with iterators.

That's true, but it's a weaker property: the minimum overhead is remains fairly low, but the maximum overhead becomes unbounded. Further, where before the CPU overhead for a next-element (or channel receive) is consistent and negligible, with this proposal, the overhead can vary per iteration.

Consider, at some hypothetical future time, there's stable code which uses a custom iterator, which which has long been assumed by readers to use a builtin collection (a plausible misidentification case, since there's no syntactical difference in the proposal), then debugging can become an issue if that iterator, which generally has very tight performance per iteration, begins tripping on an edge case that unusually blocks for a long time.

My concern is that, based on that misassumption, the thing being iterated over may be one of the last parts of the code that is inspected to find the issue (because it's assumed the non-call expression being looped over couldn't possibly be the cause). Not all programmers review or keep up with language changes, and an identical-syntax change like this could end up resulting in one of those post-incident blog posts ("How Go magic iteration caused company X to have a 16 hour outage"). That hypothetical scenario could be avoided with visibly distinct syntax to signify custom iteration.

The sending goroutine can mutate shared state or perform other side effects between sends.

That's true, though it should be quite atypical, and contrary to "share memory by communicating" (if the calling goroutine wanted to share mutable state, then channels are not likely the appropriate mechanism.

In the general case, another goroutine (running in "parallel") could be mutating, without synchronization, a slice or map being iterated.

I specifically meant that loops themselves (specifically the evaluation of range expression) cannot cause side-effects today without a visible call: a visible call (with parens) sticks out as "something special may explicitly happen here, but only prior to iteration." With this proposal, even without parens, it's the case that "something special may implicitly happen here on each iteration."

Merovius Nov 8, 2022

@extemporalgenome I don't buy the scenario you are painting as a reason. Sure, it could happen, but I can conjure up similar scenarios for all kinds of code. For example, what if code is it := myIter(); for { x, ok := it.Next(); if !ok { break } use(x) }, the reviewer checks myIter's code and sees that it just abstractly over a slice and approves. And at some point, another engineer changes it to iterate over a channel, after all "that's just an implementation detail of myIter". And then, at some point, some edgecase…

If we always assume the worst case scenario that could happen under a language change, we will never change the language. It's not a practical approach.

I'm not saying I'm not a little bit worried about the potential hidden extra cost. But I'm less worried about this than I'd be about, say, appropriating Python's in operator, which is usually assumed constant time but often is linear time.

extemporalgenome Nov 8, 2022

For example, what if code is it := myIter(); for { x, ok := it.Next(); if !ok { break } use(x) }, the reviewer checks myIter's code and sees that it just abstractly over a slice and approves. And at some point, another engineer changes it to iterate over a channel, after all "that's just an implementation detail of myIter". And then, at some point, some edgecase…

But the point is that there is a call there. However much you trust it.Next() to not change behavior, there's still a clear syntactic boundary between core language behavior and user-defined code, and as such it.Next() very clearly indicates that arbitrary side-effects or blocking may occur.

Granted, a programmer can switch an iteration over map keys to be an iteration over a channel without any other changes, and that would support your point even without the function wrapping. That is a risk inherent in language, but not necessarily one we should carry forward to other use-cases.

Aside: as relatively infrequently as it is used (in part because select statements are often needed), I do wish channel iteration had a slightly different syntax, as it doesn't have the same properties as slice and map iteration (can block, can be unbounded). As such, personally, I do not treat channel iteration as compelling precedent for extending existing range syntax to cover custom iterators.

If we always assume the worst case scenario that could happen under a language change, we will never change the language. It's not a practical approach.

I'm not suggesting no change, just adding a warning sign to things that can be dangerous. For example, if the syntax were changed slightly to be any of the following:

for v := range it... {}
for v := range @it {}
for v := range iter(it) {} // new builtin

(or even just dropping the range keyword alongside any of the above)

As long as identical syntax cannot be used for both builtin collection iteration and custom iteration, then there would be sufficient visual distinctiveness, without much typing cost, to make the code (and potential traps) much easier to reason about.

Merovius Nov 8, 2022

I understood your point. I just don't find it a compelling. If I can construct a similarly asinine scenario with the same consequences in a universe without syntactic differentiation, then it seems obvious that syntactic differentiation isn't really the issue with your example.

extemporalgenome · 2022-11-03T23:42:43Z

extemporalgenome
Nov 3, 2022

I believe there's too much magic in the variety of function signatures that will be accepted by the compiler. I'm thinking about what it would be like to teach this to new Go programmers, and I suspect there'll be a mystical aspect to this which isn't present elsewhere in the language (aside from unintended design consequences, like loop iterator bugs).

A new programmer may learn that they can iterate over any function which returns (T1, bool) and (T1, T2, bool). Can they then iterate over a function which returns (T1, T2, T3, bool)? Why not? That's surprising (to a new Go programmer)!

They also learn that they can iterate on integers! It sounds like you can iterate on almost anything. They might infer that they can iterate on (T, int), where the int indicates the number of items remaining, since, orthogonally, iterating on functions, and iterating on integers could plausibly be combinable.

At this time, I'd favor a variation on #54245 that does not allow signatures to vary in anything but type (i.e. accept Next() (T, bool) but not Next() (T1, T2, bool)). If two-clause assignment is important, then where index/key value pairing is applicable, the element returned from Next would be considered an index/key, and a Get(K) V extension method would be defined to permit for k, v := assignments, just as Stop() was proposed as an extension method.

Even without extension methods, I believe there's more value (following the introduction of generics) of a single precise [generic] method signature that is accepted, rather than a family of function [and potentially method] signatures.

If supporting functions, and not just methods, is critical, I'd be more comfortable with a single signature for push iterators, and a single signature for pull iterators.

I think it's also just fine to encourage modeling iterators after bufio.Scanner, as that introduced a clean, predictable, and well-understood style and semantics.

6 replies

atdiar Nov 4, 2022

Yes, I think some people have proposed to use a defined bool type instead as a signal. It also may help since the semantics of this boolean are a bit specific to iteration status.

And I am sympathetic to your view on push functions.
Using range on them doesn't seem to bring much but an alternate call syntax unless I'm mistaken.

DeedleFake Nov 4, 2022

I agree that there are far too many function signatures allowed here. As a general rule, Go prefers to explicitly convert to a new type or wrap one type with another in order to get something to have the features someone wants. I think all of the function signatures except for one and two value push functions should be removed, and then a new package should be added that has functions that convert from the other signatures, i.e. funcs.FromPull(somePushFunc). Push functions are the most general and add some minor functionality that is not currently available in Go otherwise so I think they're the most important ones to add directly, but the rest seem unnecessary to me.

extemporalgenome Nov 4, 2022

@DeedleFake I feel like we're missing something (or being too hasty) if we say that push iterators are universal (or "the most general"). Having the iterator own the iteration loop can cause some awkwardness in a number of cases:

What if you want to selectively consume from multiple iterators based on a condition? Imagine merge sort or any merge algorithm. To make that work with push iterators, channels would need to be involved. It'd be trivial to write a Next method and that case would be supported efficiently and for free.
If a push iterator defers a recover to catch its own panics, it'll end up catching panics from the callback as well, which the caller/callbacker likely does not expect.
Stack traces will be surprisingly longer whenever push iterators are used despite there being no visible call in the case of a sugared loop.

Certainly the above are all solvable/avoidable merely by not using sugared loops (or by not requiring a FromPull wrapping), but it does suggest there are usability issues with push iterators (they're not universally usable), and if we make them universal, people will tend to favor writing push iterators even if pull iterators would have been simpler or more appropriate.

Given that the most magical parts of the proposal are around push iterators (pull iterators don't need special defer/return behavior or the implicit transformation of a block into an anonymous function), the resultant language may be cleaner if we only solve for pull iterators to start with, and keep push iterators using explicit callbacks while we consider the impact of pull iteration in the wild.

If, for example, Go considered introducing concise lambdas with typeless parameters (i.e. only really usable for inline callbacks), that could solve push iterators, and other callback cases, as well as this proposal, albeit arguably with less magic.

DeedleFake Nov 4, 2022

All three of the problems that you outlined are usability concerns from the caller's side. Because of that, you've convinced me that push functions should not be the default.

I still think it makes sense to only support one type and require explicit conversions for the rest, but since the conversion happens on the caller's end, the form that it is converted to should not be one that removes power from the caller. Therefore, I think it makes more sense for pull functions to be the default after all.

aarzilli Nov 4, 2022

If, for example, Go considered introducing concise lambdas with typeless parameters (i.e. only really usable for inline callbacks), that could solve push iterators, and other callback cases, as well as this proposal, albeit arguably with less magic.

I disagree, what sucks about that isn't just the syntax for the closure but that break, continue, goto and return don't do what they should do.

earthboundkid · 2022-11-04T01:12:33Z

earthboundkid
Nov 4, 2022

My current worry about adding pull functions as range arguments is that it's a backdoor way to add coroutines to Go. I feel like coroutines should either be first class (have a yield statement like Python and maybe a different keyword in the declaration, like func F(In) (stream Out) { /**/ }) or they should be an implementation detail, like the iter.NewGenerator function and runtime channel operations in the old proposal. With this proposal, there's a whole new kind of procedure… but it only works if you use it in a range statement. In a way, that's more radical than "there's a new optimization, but it only applies in certain cases when the compiler is sure it's safe."

I worry that it's an avenue for possible abuse, and people are going to do "clever" things, like make "Twisted Go" (a la Twisted Python) that simulate an async system with pull functions in order to avoid the Go runtime scheduler, for whatever reason. ISTM something as important as a new mechanism that lets you suspend a function without using goroutines shouldn't be tied to the range statement.

12 replies

Merovius Nov 4, 2022

@willfaught You are addressing the clarification of what my question meant. Do you have an answer to the question itself?

earthboundkid Nov 4, 2022

So how would that be different, in regards to the specific criticism that push functions are only useful in the context of range?

As I understand your question and the proposal, to get the coroutine behavior of suspending and resuming with panic propagation across boundaries, you have to do range pf because if you do

myContainer.Iter(func(v value) bool {
   panic("boom")
})

The stack trace will look like outerfunc > Iter, but if you do the panic in a range loop (for range myContainer.iter { panic("boom") }), the stack trace will look like outerfunc alone.

If you convert the push func to a pull func with some library construct, then the stack trace will of course just have outerfunc, but that's because Iter will be off running in its own goroutine.

So, if you're in a range call and AFAICT only if you're in a range call, you can suspend Iter without spawning a goroutine and yielding to the scheduler.

Does that answer your question?

willfaught Nov 4, 2022

You seemed to be responding to #56413 (reply in thread), and asked a straightforward question about what it would look like, so that's what I addressed.

How is that different from a hypothetical yield statement? I'm having trouble fitting that into Go in a way that doesn't come down to the same thing.

I agree that there's ultimately no difference. It's just an implementation detail. For all the user knows, the compiler actually will use a channel and real goroutine when converting from push to pull.

Merovius Nov 4, 2022

to get the coroutine behavior of suspending and resuming with panic propagation across boundaries

It is my understanding that the only reason we'd need to change this is to make converting from push functions into pull functions more efficient (i.e. exactly the optimization described in the Appendix to #54245 - and which, AIUI, you consider the preferable alternative. In particular, you say:

I feel like coroutines should either be first class […] or they should be an implementation detail, like the iter.NewGenerator function and runtime channel operations in the old proposal.

But in terms of these optimizations, this new design does not actually differ from the old design. The old design called "pull functions" iter.Iter and called "push functions" generators and suggested to name the conversion from ~~iterators~~ generators into ~~generators~~ iterators iter.NewGenerator. And it suggested to apply the optimization to that case. None of that differs in the slightest from what we are discussing here. At least as far as I understand it.

It is not my understanding that the compiler should allow conversions from push to pull functions or vice-versa or that it should do them automatically. But that the compiler should implement range over pull by straight forward iteration code, range over push by translating the loop body into an opaque func and that an iteration library could convert between the two as glue.

Supporting push functions in range does require some magic to transform a loop body into an opaque func value, but that has nothing to do with coroutines.

Does that answer your question?

I'm not sure. Your original statement was

With this proposal, there's a whole new kind of procedure… but it only works if you use it in a range statement.

I still don't understand this statement, in relationship to the idea of a builtin yield statement/function. ISTM that such a coroutine would also only work in a range statement. Or, at least, it wouldn't work in any more contexts than push/pull functions do. And the explanations so far don't seem to really contradict that. So I'm still a bit confused what you meant.

Where you in fact referring to the "magic" translation of loop bodies into opaque func values? If so, I'd understand a bit better where you are coming from. Though I'd still be a bit confused about the criticism, because it seems fairly self-evident that this translation only happens for range statements.

seancfoley Nov 5, 2022

@carlmjohnson Your Javascript MDN example is exactly the sort of magical code that the go language has so far managed to avoid.

To each his own, but I sure hope Go does not go that route.

SealOfTime · 2022-11-04T20:00:57Z

SealOfTime
Nov 4, 2022

In my opinion, yield is a complicated enough concept to cause a lot of bad incomprehensible code to appear, this suggestion provides only a syntax sugar for writing something, that is already more than possible in the language. I believe, this goes against a rule of One problem - one solution.
Please, let Go stay boring

1 reply

htemelski Nov 5, 2022

So much this.
What happened between 2018 when Rob Pike said that adding more features would make the language bigger but less different and now?
I'm afraid that the current push to add more features is just going to turn the language into another flavour of the generic programming language.

pat42smith · 2022-11-07T01:47:35Z

pat42smith
Nov 7, 2022

Pardon me if this has already been discussed, but it occurs to me that push functions could be replaced by a simple variant of pull functions. I'm not sure whether this would be better than push functions (that's surely a subjective matter where opinions will vary), but it seems to me a quite reasonable alternative.

Let's consider just the case of loop yielding one values at a time: for x := range something, where x has type T. The case of two values at a time isn't different in any significant way. In the initial discussion, a pull function for this case has signature

func() (T, bool)

We could also consider pull functions with signature

func(bool) (T, bool)

The intent here is when called with true, the pull function acts as the previous pull function, returning either (next value, true) or (arbitrary, false). When called with false, the pull function performs any cleanup necessary for the end of the loop. A for range statement would call the pull function with false when the loop is terminated prematurely.

Given a function pullx with signature func(bool) (int, bool), we could write

for x := range pullx {
	fmt.Println(x)
	if x >= 5 {
		break
	}
}

and the compiler would translate this into something similar to

for x, ok := pullx(true); ok; x, ok = pullx(true) {
	fmt.Println(x)
	if x >= 5 {
		pullx(false)
		break
	}
}

The initial discussion remarks that any push function can be automatically transformed into a pair of a next pull function and a stop cleanup function. We can continue this to get a pull function of the new form:

func pullx(yield bool) (value T, ok bool) {
	if yield {
		return next()
	} else {
		stop()
		return
	}
}

and this new function can be used as the object of a for range without any need for an explicit call to stop.

Some questions around this:

When a call to pullx(true) returns (whatever, false), should the for range loop call pullx(false)? I tend to think not, that pullx should do the cleanup when it returns false. But I don't really care either way.
If the loop body panics, should the for range loop call pullx(false)? If so, precisely when? What happens if both the loop body and pullx(false) panic? My initial feeling: yes, immediately on loop termination and before executing any deferred functions in the function where the loop occurs. And I haven't a clue what to do about the double panic.
If for range accepts this version of a pull function, then there is no need for it to accept the original (no parameter) version of a pull function. Would we want to accept both, or just the new version?

20 replies

pat42smith Nov 9, 2022

Wouldn't the pullx invocation be rewritten into:
for x, ok := pullx(true); ok; x, ok = pullx(true) {
	fmt.Println(x)
	if x >= 5 {
		break
	}
}
pullx(false)
to be sure that the cleanup will always be called, and not just if the users breaks from the loop. Or is the assumption here that the pullx will also do cleanup when the resources are exhausted - though wouldn't that make the implementation more complicated?

I think it would be fine either way. If we chose to allow pull(bool) functions in for range, the spec would have to be clear about whether for range calls pull(false) after pull(true) returns false, and writers of pull functions would adjust appropriately.

Merovius Nov 9, 2022

I think it would be more interesting what happens if the loop body panics. Presumably that should call pullx(false). Which means that should probably get defered by the range (answering the question above). But that would be the first time (I think) a language feature would implicitly defer something.

pat42smith Nov 9, 2022

I think it would be more interesting what happens if the loop body panics. Presumably that should call pullx(false). Which means that should probably get defered by the range (answering the question above). But that would be the first time (I think) a language feature would implicitly defer something.

It would be sort of like a defer, yes. But maybe not exactly. Consider

var pullx func(bool) (T, bool) = ...

n := 0
for t := range pullx {
	defer func() { ... }()
	n++
	if n >= 10 {
		panic(t)
	}
}

Should the call to pullx(false) occur before or after the 11 calls to functions deferred by the loop body?

Also consider that if the implementation of for range uses defer to call pullx(false), then that call won't happen until the containing function exits. But in the common case where the loop body does not panic but does break, pullx(false) should be called immediately after the loop terminates, and before any other code in the containing function runs.

Merovius Nov 9, 2022

To me, all of this seems like a decent argument that the control flow of pullx is not as straight forward as it may seem.

pat42smith Nov 9, 2022

To me, all of this seems like a decent argument that the control flow of pullx is not as straight forward as it may seem.

Yes, I am now thinking that push functions would be a better choice. They also don't necessarily have simple control flow, but are sometimes easier to write.

mdempsky · 2023-01-17T22:11:18Z

mdempsky
Jan 17, 2023

The ergonomics of push-based iterators seem nice, but I'm concerned it has a lot of corner cases to think about:

What happens if the iterator continues to call the yield function even after it returns true?
What happens if the iterator holds onto the yield function and calls it after returning?
What happens if the iterator calls the yield function on another goroutine?
What order are deferred calls invoked in? We're logically interleaving execution of code from two different functions, so it's possible we interleave defer statements too. E.g., suppose a recursive iterator like Tree[K,V].All contained a defer statement, as did the for loop that invoked it.

I expect these questions don't directly matter to most users, but I think they're relevant to the compiler for how it desugars control flow statements. In turn, this is indirectly relevant to users because it could affect performance.

I think a lot of misuse (e.g., questions 1 and 2) could be cheaply caught by simply poisoning the closure's PC field after we don't expect it to be called any further.

24 replies

bcmills Jan 19, 2023

@mdempsky

What's unsavory to me is that there's no obviously-good ordering on when the deferred calls happen.

I disagree. I think there is one obvious ordering: the calls deferred by the push function occur when the push function returns — which is after the caller finishes executing the last iteration of the loop and before the caller executes the first statement outside of the loop. (That is: the deferred calls occur when execution leaves the for … range statement in the caller.)

bcmills Jan 19, 2023

I think my last paragraph in #56413 (reply in thread) was confused. The deferred calls aren't in some kind of global LIFO order — the deferred calls in each function are in LIFO order, and each function executes its deferred calls when the function returns (or halts via panic or Goexit).

mdempsky Jan 19, 2023

I think there is one obvious ordering: the calls deferred by the push function occur when the push function returns

I agree that's an obvious ordering, yes. It's the same one @DeedleFake suggested, for example.

I'm saying it's not obviously good: it means deferred calls no longer happen in strict LIFO order with respect to their corresponding defer statements.

I periodically see tracing code written like defer f()() where f() pushes something onto a stack, and then the returned function is responsible for popping it off. This idiom becomes error-prone if we abandon LIFO ordering.

mdempsky Jan 19, 2023

The deferred calls aren't in some kind of global LIFO order

When you say "global LIFO order," I hear an ordering across all goroutines within a process. I'm not suggesting that exists either.

But today we do maintain a strictly LIFO, per-goroutine stack of deferred calls: each defer statement pushes a call onto the goroutine's defer stack, and panic and return are responsible for popping calls off the stack as necessary.

The proposal here implies relaxing the "strictly LIFO" part of that. We can certainly do that, but I think it should be taken very seriously. defer/panic are already very subtle, and the implementation today is quite complex and fragile.

Ian points out the iterators could actually operate under the hood using two goroutines, which would cleanly address the implementation concerns around deferred calls. But it wouldn't have any performance advantages, since the API is synchronous anyway. So that seems like it would be pure overhead to me.

But as I also pointed out, I question whether users actually intentionally write defer statements inside for loops, intending for the calls to queue until function return. And if they don't, we can just disallow them in the presence of push-based iterators, which avoids the whole issue. We can always relax that restriction in the future if use cases present themselves.

pat42smith Jan 19, 2023

From #56413 (reply in thread)

The defer statements called within a loop will appear in a well-defined order. The defer statements called by a push function will appear in a well-defined order. Nobody is saying otherwise. The only question is whether there is any required ordering between the defer statements called within a loop and the defer statements called by a push function. I am suggesting that for that latter case only there is no required order, just as there is no required order in the goroutine example I wrote two paragraphs up.

The point is that there is only one order in which the deferred function calls can be executed that satisfies both existing language semantics and reasonable rules around iterating over push functions (as outlined in #56413 (reply in thread)).

Earlier, I argued for a sort-of converse of this, that if you thought the order was not determined, then you must intend to change existing language semantics. That might have been confusing; I apologize for that.

To take a concrete example, slight modified from #56413 (reply in thread)

func iter(yield(v int) bool) {
	for i := 0; i < 3; i++ {
		defer fmt.Printf("iter: %v\n", i)
		yield(i)
	}
}

func main() {
	defer fmt.Println("start")
	for v := range iter {
		defer fmt.Printf("loop: %v\n", v)
	}
	fmt.Println("middle")                // Added; intentionally not a defer statement
	defer fmt.Println("end")
}

Here the defer statements (not the deferred function calls) must be executed in this order:

defer fmt.Println("start")
defer fmt.Printf("iter: %v\n", 0)
defer fmt.Printf("loop:%v\n", 0)
defer fmt.Printf("iter: %v\n", 1)
defer fmt.Printf("loop:%v\n", 1)
defer fmt.Printf("iter: %v\n", 2)
defer fmt.Printf("loop:%v\n", 2)
fmt.Println("middle")
defer fmt.Println("end")

Note that the defer statements from the push function and loop body are interleaved, even though the deferred function calls will not be (as we will see).

The function calls deferred inside the push function iter occur in LIFO order, and they occur when iter returns, which must be before fmt.Println("middle") is executed. So we must have this sequence (possibly interleaved with other deferred function calls, so far as we know at this point in the argument):

fmt.Printf("iter: %v\n", 2)
fmt.Printf("iter: %v\n", 1)
fmt.Printf("iter: %v\n", 0)
fmt.Println("middle")

But by existing language semantics, the function calls deferred within main must occur in this sequence:

fmt.Println("middle")
fmt.Println("end")
fmt.Printf("loop:%v\n", 2)
fmt.Printf("loop:%v\n", 1)
fmt.Printf("loop:%v\n", 0)
fmt.Println("start")

Since fmt.Println("middle") occurs at the end of one subsequence and the beginning of the other, there is only one way they can be combined:

fmt.Printf("iter: %v\n", 2)
fmt.Printf("iter: %v\n", 1)
fmt.Printf("iter: %v\n", 0)
fmt.Println("middle")
fmt.Println("end")
fmt.Printf("loop:%v\n", 2)
fmt.Printf("loop:%v\n", 1)
fmt.Printf("loop:%v\n", 0)
fmt.Println("start")

And we see that function calls deferred in the push function and the loop body cannot be interleaved, even in the absence of an explicit rule against interleaving.

pat42smith · 2023-01-19T00:11:11Z

pat42smith
Jan 19, 2023

On Wed, Jan 18, 2023 at 2:54 PM Ian Lance Taylor ***@***.***> wrote: Actually, thinking about this more, I'm not sure that the exact order in which the defer statements are executed should be precisely defined. One possible implementation of for/range over a push function is to start a new goroutine and have the push function send the values over a channel (with a second channel used to exit the goroutine on loop termination if necessary). I don't think we want to rule out that implementation a priori. In that case the interleaving of the "iter" and "loop" messages would not be fully specified.

If I understand correctly what you're suggesting, then I must disagree. The "iter" messages must appear before the "end" message, because they are deferred by the push function, which terminates before main defers the end message. And the "loop" messages must appear after the "end" message, because they are deferred earlier and by the same function (main). To have a new form of iteration change this would be endlessly confusing for programmers. And I imagine it would lead to many data races in programs with no visible goroutines. Also, a future change in implementation from "no extra goroutine" to "a secondary goroutine" would change program results and introduce data races. No, an implementation using another goroutine would still have to guarantee this order, in my opinion. I know almost nothing of the compiler internals, but I imagine this would be doable but possibly quite difficult. Or am I misunderstanding something?

3 replies

ianlancetaylor Jan 19, 2023
Collaborator

Note: this is out of sequence because it was sent via e-mail rather than added to the discussion thread.

I agree that the "iter" and "loop" messages must appear before the "end" message. What I said was that the interleaving of the "iter" and "loop" messages could, perhaps, be unspecified. That is, while the "iter" messages must appear in the obvious order, and the "loop" messages must appear in the obvious order, it's unspecified whether the "loop 0" appears before or after "iter 0", etc.

pat42smith Jan 19, 2023

Note: this is out of sequence because it was sent via e-mail rather than added to the discussion thread.

Mea culpa. I forgot that replying by e-mail has that undesirable side effect. I'll try to remember in future.

I agree that the "iter" and "loop" messages must appear before the "end" message.

Sorry if I wasn't clear. My opinion is the "iter" messages must appear before the "end" message, and the "loop" messages must appear after the "end" message. So the "iter" messages are separated from the "loop" messages by the "end" message, and no intermixing is possible. Not because of implementation details, but because of language semantics that it would be too confusing to change.

Consider this snippet (with no "iter" messages):

func main() {
	defer fmt.Println("start")
	for x := range whatever {
		defer fmt.Println(x)
	}
	fmt.Println("middle")
	defer fmt.Println("end")
}

Currently, no matter what the type of whatever, if x takes the values 0, 1, and 2 in that order, then this must print

middle
end
2
1
0
start

If I understand you correctly, you are suggesting that in the one special case that whatever is a push function, the output could also be, for example,

2
middle
1
end
0
start

or

middle
end
start
2
1
0

or many other possibilities.

I find this a startling departure from the current state of affairs.

ianlancetaylor Jan 19, 2023
Collaborator

Sorry for misunderstanding. But your concern is not what I'm suggesting. I agree that whether or not you use a push function the order of the defer statements in your example is unchanged.

What I am saying is that if the push function itself uses defer statements, then the order in which those defer statements, the ones in the push function, run, compared to the order in which the defer statements executed during the loop are run, is unspecified.

dolmen · 2023-01-24T15:36:19Z

dolmen
Jan 24, 2023

I really like how this proposal provides a unified syntax (range) for both internal (push) and external (pull) iterators.

But, as someone who reads (reviews) much more code than I write, my only (non-blocking) concerns are about the complexified mental model around a for range loop because of the explosion of the possible types and underlying hidden complexity and cost.

So far when I see the following loop:

for a, b := range X {
    ...
}

I only have to determine if X is an array, a slice, a string or a map. As array is similar to slice and range over string is quite rare in business code and quickly identified by the context, the question is usually more between 2 alternatives: slice/array or map. I can usually reply to that question using the func scope around the loop.

However by introducing push/pull range iterators, the number of possible types will explode. And even more, the cost of each iteration style will be much more varied: a user-defined iterator might have some bugs or performance issues that I don't expect from built-in iterators. The risk of hidden panics will also explode (so far, no panic on iterating on a nil slice or nil map). My existing review tooling (git diff, GitLab Merge requests viewed in browser) that doesn't provides type information inline will become insufficient if I can't easily determine the iterator construction.

That is a case where this added syntactic sugar will ease write more concise Go, but increase mental load of human readers.

Range over plain integers (for i := range 5) as suggested as a later step would make it even worse: in for e := range arg.Elements there is a huge difference in block behavior if Elements is an int vs a []string.

0 replies

dolmen · 2023-01-24T15:40:11Z

dolmen
Jan 24, 2023

database/sql.Rows is mentionned as an interator example, but I think that isn't one that will benefit from this proposal. Errors may happen while iterating or inside an iteration callback, and this proposal doesn't handle that case.

5 replies

dolmen Jan 24, 2023

A bit off-topic as this isn't about the push/pull proposal, but as I'm mentionning database/sql.Rows I wanted to mention some experiments I did over the years around simplifying iterating over it.

As a heavy user of database/sql.Rows I have written my own external iterator around it with the following signature:

// QueryRows calls QueryContext and loop over rows calling scanRow. If scanRow
// fails, the error it returns is wrapped in a RowError.
func QueryRows(
    ctx context.Context,
    db interface { QueryContext(context.Context, string, ...any) (*sql.Rows, error) },
    query string,
    args []any,
    scanRow func(*sql.Rows) error,
) error

However I almost never use it myself because:

it would increase mental load for readers: need to know that utility function in addition to database/sql.Rows methods
the increased syntax complexity: a closure argument vs a loop block (this range proposal would help in many other iteration cases, but not for sql.Rows where we have to handle runtime errors)
the increased runtime cost of function call (small, but it adds to the readability costs as a drawback)
the biggest complexity when iterating over sql.Rows is the call to rows.Scan where you have to pass pointers to target variables (forgetting & is a common beginner mistake), and that wrapper was still not encapsulating it.

I have gone further with a more general sql.Rows iterator in my package github.com/dolmen-go/sqlfunc (see ForEach and Query) going towards encapsulating the Rows.Scan call but its heavy use of reflect makes it perform badly. I had started some work on a code generator (to move type introspection to a go:generate phase in order to avoid use of reflect.Value.Call), but I paused this in 2021 while waiting for generics.

mrwonko Jan 24, 2023

That could look something like this, right?

for range rows {
    rows.Scan()
}
if err := rows.Err(); err != nil {
    handle(err)
}

Admittedly not a big improvement over the current for rows.Next(), but at least more uniform. You have to remember the error check either way.

If we adjust the interface somewhat, could that help?

for err := range rows {
    if err != nil {
        handle(err)
        break
    }
    rows.Scan()
}

I don't know, but I like that the proposal gives us this option. Now you can build whatever adapter you like.

earthboundkid Jan 24, 2023

for row, err := range rows {
   if err != nil {
     // ...
   }
   var a, b, c T
   err = row.Scan(&a, &b, &c)
   // ...
}

dolmen Jan 24, 2023

@carlmjohnson This is more verbose and less efficient than:

for rows.Next() {
   var a, b, c T
   err := rows.Scan(&a, &b, &c)
   // ...
}

And you forgot to check for rows.Err() after the loop, and this is necessary in both versions.

if err := rows.Err(); err != nil {
  // ...
}

earthboundkid Jan 24, 2023

I'm proposing a different API, which would not require the extra error check. row.Scan was not a typo for rows.Scan. It's a new type that represents a single row. It's unlikely that such an API change will happen however because it would be somewhat redundant with current API. Maybe if there's ever a database/sql/v2.

gavingroovygrover · 2023-03-15T14:09:39Z

gavingroovygrover
Mar 15, 2023

For Range over ints, would using the same syntax as for slicing subscripts be more "Go-like"? E.g.
for i := range [:n] {...}
We could use other start-points, e.g.
for i := range [2:len(a)],
or require an explicit break, e.g.
for i := range [:].

0 replies

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment has been hidden.

Sign in to view

This comment was marked as spam.

Sign in to view

user-defined iteration using range over func values #56413

Uh oh!

Uh oh!

rsc Oct 25, 2022 Maintainer

Push functions

Pull functions

Duality of push and pull functions

Alternatives

Range over ints

Discussion

Replies: 53 comments · 330 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evanphx Oct 25, 2022 Collaborator

Uh oh!

rsc Oct 25, 2022 Maintainer Author

Uh oh!

Uh oh!

prattmic Oct 25, 2022 Maintainer

Uh oh!

Uh oh!

rsc Oct 26, 2022 Maintainer Author

Uh oh!

evanphx Oct 26, 2022 Collaborator

Uh oh!

Uh oh!

Uh oh!

cespare Oct 25, 2022 Collaborator

Uh oh!

rsc Oct 25, 2022 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

firelizzard18 Oct 26, 2022 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rsc
Oct 25, 2022
Maintainer

Replies: 53 comments 330 replies

evanphx
Oct 25, 2022
Collaborator

rsc Oct 25, 2022
Maintainer Author

prattmic Oct 25, 2022
Maintainer

rsc Oct 26, 2022
Maintainer Author

evanphx Oct 26, 2022
Collaborator

cespare
Oct 25, 2022
Collaborator

rsc Oct 25, 2022
Maintainer Author

firelizzard18 Oct 26, 2022
Collaborator