Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A.p.join on cyclic arrays does not reflect web reality #289

Open
rossberg opened this issue Jan 15, 2016 · 36 comments · May be fixed by #1518
Open

A.p.join on cyclic arrays does not reflect web reality #289

rossberg opened this issue Jan 15, 2016 · 36 comments · May be fixed by #1518
Labels
normative change Affects behavior required to correctly evaluate some ECMAScript source text web reality

Comments

@rossberg
Copy link
Member

Browsers have special measures for dealing with cyclic arrays in the A.p.join operator (and by extension, A.p.toString). Chrome, IE, Firefox, & Safari all behave as follows:

a = [1,,2]
a[1] = a
a.toString() === "1,,2"

That is, they detect the cycle and replace it with the empty string. This is not reflected by the spec, which would require toString to diverge.

Should browsers change, or the spec?

@allenwb
Copy link
Member

allenwb commented Jan 15, 2016

I believe this was discussed during ES5 development and at the time Brendan took the position that this was a browser specific hack that should be in the spec (sorry, Brendan, if my recollection is wrong).

I suspect that at this point in time TC39 would more likely take the position that we need to specify web reality. Either in the main body of the spec or in Annex B.

I good first step would be for somebody to write some spec language that describes what browsers (or at least one browser) actually does.

@bterlson
Copy link
Member

It is annoying to spec because it requires plumbing a seen set through ToString :/

@bterlson
Copy link
Member

Actually that way will not work as the seen set has to be plumbed through a toString invocation which is user-visible. I'm not sure how to do this.

@bterlson
Copy link
Member

Chakra's implementation is to essentially keep a seen list in the current Realm. I suppose that would work here as well.

@anba
Copy link
Contributor

anba commented Jan 15, 2016

https://gist.github.com/anba/4a154bd7143bf2bab3ef, except the differences between JSC+SM compared to Chakra+V8 when the object is added to the set.

@bterlson
Copy link
Member

@anba looks solid at first glance. What is the JSC+SM difference?

@anba
Copy link
Contributor

anba commented Jan 15, 2016

JavaScriptCore and SpiderMonkey check for cycles right after calling ToObject(this):
https://github.com/WebKit/webkit/blob/7920183734b8510146db5aded800cbd67eee1d84/Source/JavaScriptCore/runtime/ArrayPrototype.cpp#L487
https://github.com/mozilla/gecko-dev/blob/137b0dc6d16ce7065a40079d4b8027ec138d4987/js/src/jsarray.cpp#L1099

Chakra checks for cycles after calling ToString(separator) and additionally performs extra proxy checks:
https://github.com/Microsoft/ChakraCore/blob/d8aa1dd25637e749dab720e5c5732535cb913d79/lib/Runtime/Library/JavascriptArray.cpp#L4100

V8 checks for cycles after calling ToString(separator) and additionally performs extra steps for single-element arrays:
https://github.com/v8/v8/blob/25532be593bd80c9e7b3fac4ee50491159902e93/src/js/array.js#L183
https://github.com/v8/v8/blob/25532be593bd80c9e7b3fac4ee50491159902e93/src/js/array.js#L458

(Similar differences are present for Array.prototype.toLocaleString.)

@bterlson
Copy link
Member

@allenwb raised some issues in the gist that I will respond to here :)

Regarding how realms behave, essentially the realm that join belongs to keeps track of any array objects its seen, whether they are from another realm or the same realm. This means that you will actually join the same array twice if you can manage to get that array to be joined by join functions from different realms. This is actually interoperable behavior!

var a = [1,,2];
var $child = $.createRealm();
$child.evalInNewScript('var b = [3,,4]');
var b = $child.global.b;
a[1] = b;
b[1] = a;
print($child.global.Array.prototype.join.apply(a));

spidermonkey

1,3,1,,2,4,2

chakra

1,3,1,,2,4,2

node

1,3,1,,2,4,2

Also, absolutely code can get invoked in the middle, and this is expected. When executing Array.prototype.join, and an element has a custom toString or is a proxy or whatever, and that function causes Array.prototype.join to be applied to an array instance that is already being joined further up the stack, an empty string is returned. This is also expected and interoperable.

var a = [1,,2];
a[1] = {
  toString() {
    return '"' + a.toString() + '"';
  }
}

print(a.toString());

d8

1,"",2

chakra

1,"",2

spidermonkey

1,"",2

node

1,"",2

@allenwb
Copy link
Member

allenwb commented Jan 15, 2016

@bterison
(oops, I actually intended to respond here rather than to the gist)

WRT the xrealm behavior: Did you try it with three references to the x-realm array? I would think that each inner x-realm join call would exit with an empty cycle list in its own realm as it has no way of knowing that it is being called by a toString/join from another realm and hence needs to preserve the list.

WRT the second issue (reentrency) the problem is that the a proxy or whatever might might start a completely unrelated toString/join that is completely unrelated to an toString/join that is pending on the stack. If the unrelated toString/join coincidentally contains a reference to sometime in the global cycle list it is poisoned by the pending operation

I think we could do something more rational in each of these cases and I double that there is user code that depends upon the current behaviors in cases like these.

@bterlson
Copy link
Member

Did you try it with three references to the x-realm array?

Can you give me an example? I'm not sure I follow.

the problem is that the a proxy or whatever might might start a completely unrelated toString/join that is completely unrelated to an toString/join that is pending on the stack.

I can buy that this is a problem in the sense that it seems bad, but I'm not sure of a way out of it without introducing observable extra parameters to toString. It is also how implementations behave today. Seems reasonable to spec this as a starting point at least? (Assuming it works for the case you suggest above).

@rossberg
Copy link
Member Author

One set per realm is equivalent to associating each (instance of the) join method with its own seen list, as a sort of private state of the function object. I think that makes sense, as far as sense goes for this kind of thing. Maybe it can even be specced as a special [[property]] the join function creates and maintains on itself?

@allenwb
Copy link
Member

allenwb commented Jan 18, 2016

It's primarily the reentrancy issue that I'm concern about. I'm pretty sure that this could be fixed at the specification level (either by adding a an optional second argument to join for the cycle set or by specifying join in terms of an abstract operation that recurs with an cycle set argument. (and using IsArray to determine whether to recur on the abstract operator or to just toString).

In either case, this has been unspecified for a very long time. I don't see any need to rush to a problematic solution that ignores the reentrancy issue and just copies a poor implementation decision of the past. I suspect we can do better while still maintaining adequate backwards compatibility.

@claudepache
Copy link
Contributor

@allenwb I have hard time to find a plausible scenario where a reentrant call to A.p.join with a same this argument would not highly risk to provoke an infinite cycle, unless some function involved in the cycle takes an explicit measure to prevent that.

@bterlson
Copy link
Member

@allenwb I would normally agree, although this was something we implemented because we found the web depended on it. I would support standardizing what is currently required to run the web and we can discuss improvements (eg. to handle any reentrancy problems). What's the downside with this approach?

@ljharb
Copy link
Member

ljharb commented Apr 26, 2019

This has been open a long time; I've made an attempt in #1518 to address it. Please review, and I'm more than happy to adjust to all feedback.

ljharb added a commit to ljharb/ecma262 that referenced this issue Apr 26, 2019
ljharb added a commit to ljharb/ecma262 that referenced this issue Apr 26, 2019
@cpcallen
Copy link

Three observations. With respect to web reality:

On the topic of reentrancy, from our experience implementing an eccentric ES dialect supporting multiple (cooperatively multitasked) threads, running code from multiple novice but potentially-mutually-hostile programmers:

  • It's not at all obvious to me how one might decide whether or not a toString/join call created by a proxy is "completely unrelated to" a pending one on the stack. We don't support proxies (or multiple realms), but faced an analogous question with regards to threads, and decided to make the [[Seen]] set per-thread rather than global. Our reasoning for this was that in general we didn't want a pending join call in one thread to affect the return value of a join call in another thread. There could be cases where a toString method's implementation farms the work out to a sub-thread which calls join reentrantly (and in that case you might indeed want the second join to use the same [[Seen]] set), but we felt this was not a trap an unwary novice was likely to stumble into by accident.

Finally, rather tangentially:

  • We note that Error.prototype.toString contains the same potential infinite recursion trap that Array.prototype.join does (consider var e = new Error; e.message = e; String(e);) and plan to add a cycle check to it as well. I'm not aware of any existing implementations which have such a check, and in the absence of prior art I'm inclined to use the same (per-thread) [[Seen]] set, rather than maintaining separate ones for Error.p.toString and Array.p.join.

@devsnek
Copy link
Member

devsnek commented Apr 26, 2019

decided to make the [[Seen]] set per-thread rather than global

I think this is a given as the spec assumes (or requires?) that values aren't usable between threads

and plan to add a cycle check to [Error.prototype.toString] as well

was this planned to be brought to committee first? web reality is annoying to deal with retroactively, and I'd argue that throwing makes more sense than anything else.

@erights
Copy link

erights commented Apr 26, 2019

Are we sure that this Seen Set cannot be used as a global communications channel?

@ljharb
Copy link
Member

ljharb commented Apr 26, 2019

@erights in the current spec, “that an array is cyclic” is observable by the stack overflow caused by an infinite loop, or by observing repeated calls to your cyclic object’s toString. In the web, it’s observable by knowing where a cycle is, and confirming that join produces a string with a gap where your cyclic object is expected. I’m not sure what would be newly communicated in either case. Could you elaborate on your concerns?

@erights
Copy link

erights commented Apr 26, 2019

I'm not worried about observing that an array is cyclic. I could easily write an algorithm to do this in user code anyway. I am also fine with the outcome, that toString completes with a string with a gap, rather than crashing.

I also do not know how to produce this sensible outcome without a Seen Set somewhere. This concern is: by its nature, the Seen Set is mutable. We are trying to keep hidden mutable state out of the primordials, so that transitively freezing the primordials results in an transitively immutable primordial state (assuming Date.now and Math.random have already been repaired), so that subgraphs which are otherwise isolated but share transitively primordials still cannot communicate.

I suspect the Seen Set as used here does not enable such communications. But it would be good to verify this one way or the other. Anytime we introduce hidden global (or per-realm) mutable state, we should worry about this.

@erights
Copy link

erights commented Apr 26, 2019

Because this recurs through user code, which might initiate a toString on an unrelated array, a shared Seen Set would be wrong. I am not sure if this is the reentrancy issue @allenwb raises at #289 (comment)

@ljharb
Copy link
Member

ljharb commented Apr 26, 2019

@erights in #1518, it's not actually a Set, it's a spec List, that's not directly observable or reachable from user code. It's also not plumbed through toString, so there's no risk of that happening.

@erights
Copy link

erights commented Apr 27, 2019

@ljharb I understand that the collection itself is not reified and made available to user code as an object. But it certainly affects observable behavior of user code. If there's one per realm, then for independent toStrings on joins that are proceeding interleaved (which, yes, would need bizarre coding patterns), both would condition their behavior using the same hidden seen collection. I do not see how to become confident that they do not observably interact with each other.

@waldemarhorwat
Copy link

I have some of the same concerns as @erights.

@allenwb
Copy link
Member

allenwb commented May 15, 2019

@erights yes, reentrancy via toString is one if my concerns. But reentrancy could also occur via Proxies. Perhaps a solution would be to abandon cycle detection if the "array" or any element is a Proxy or if a non-instrinsic toString us encountered

@erights
Copy link

erights commented May 15, 2019

Hi @allenwb Do proxies raise any concern that a user-defined toString does not raise?

@allenwb
Copy link
Member

allenwb commented May 15, 2019

I don't believe Proxies raise any extra concerns, just another path that need to be considered.

I've thought about this some more and I'm pretty sure it is possible to specify the algorithm that avoids any reentrancy issues.

One way to do it is for the A.p.join algorithm to do an a conditional inlined specialization of the recursive call to join via toString. the specialization occurs for cases where the toString is the A.p.toString intrinsic. This could turn the recursive algorithm (using a global circularity detector data structure) into an iterative algorithm using an internal (to invocations of A.p.join) circularity detector data structure. Reentrancy is not a problem because each explicit call to A.p.join (outside of the toString recursion) starts with a fresh circularity detector data structure.

I haven't worked through the details but it feels like it should work for data structures built out of normal JS Arrays objects. It doesn't necessarily work for data structures that interweave Arrays and non-Array objects with custom toString methods. I'm not sure what the web-reality algorithm does in such cases (or if there is a single web-reality algorithm). I strongly suspect that breaking the web concerns don't actually show-up for these latter data structures.

So, I'm back to where I was in #289 (comment) . We can specify this, at least well enough to avoid meaning web breakage, without using a global data structure.

@erights
Copy link

erights commented May 15, 2019

Hi Allen, I love the direction, but I'm not sure I understand how such an algorithm would be written. Care to give it a try? Rough pseudo-code would be informative enough. Thanks.

@allenwb
Copy link
Member

allenwb commented May 15, 2019

Roughly

The Array join method takes one argument, separator, and performs the following steps:

Let O be ? ToObject(this value).
Let len be ? ToLength(? Get(O, "length")).  //just for backwards compat if exception
If separator is undefined, let sep be the single-element String ",".
Else, let sep be ? ToString(separator).
Return ArrayJoin(O,  separator, new empty List)

Abstraction operation ArrayJoin(O, Sep, Seen)

If Seen contains O, return "".
Add O to Seen.
Let k be 0.
Let R be the empty String.
Let len be ? ToLength(? Get(O, "length")).
Repeat, while k < len
  If k>0, set R to the string-concatenation of R and sep.
  Let element be ? Get(O, ! ToString(k)).
  If element is in Seen or element is undefined or null, let next be the empty String; 
  else,
     Let m be ?GetV(O, "toString").
     if SameIntrinsicAnyRealm(m, %ArrayProto_toString%) is true,
          let next be ?ArrayJoin(element, Sep, Seen)
      else let next be ? ToString(element)
  Set R to the string-concatenation of R and next.
  Increase k by 1.
Return R.

@ljharb
Copy link
Member

ljharb commented May 15, 2019

Thanks, I'll take a shot at updating #1518 with those spec steps.

@cpcallen
Copy link

I'm not sure what the web-reality algorithm does in such cases (or if there is a single web-reality algorithm).

Not that. Consider:

const arr = [0, {toString: () => '(' + arr + ')'}, 2];
String(arr);

which returns '0,(),2' in V8 and jsc at least, but which would result in infinite recursion under the proposed algorithm.

It seems to me that the preconditions to being bitten by this—defining toString methods, and relying on the recursion checks in A.p.join—are all too easily satisfied.

@ljharb
Copy link
Member

ljharb commented May 15, 2019

Wouldn’t that already be the case with the current spec?

@allenwb
Copy link
Member

allenwb commented May 15, 2019

Is @cpcallen's pattern one that has actually been observed on the web? Perhaps, @bterlson knows because of #289 (comment) .

Presumably, the goal here is not to eliminate all possible non-terminating behavior starting from A.p.join because that would be the halting problem. EG,

[0, {toString() {while(true);}}, 2].join()

That said, I'm becoming less concerned about an single agent global circularity set shared by all A.p.join intrinsics. Run-to-completion guarantees that once a call to A.p.join starts it will either complete without preemption or not terminate at all. That means that a single initially empty circularity set should be safe. On entry A.p.join checks if the set is empty, if it is that join invocation is the initiator of a new traversal and hence has the responsibility of ensuring that the set is cleared before terminating (probably via the equivalent of a finally). Reentrant calls to join, independent of the traversal such as from Proxy handlers, would still be swept up into the active circularity set, but that sounds like it is already the implemented behavior.

A single global set, rather than a per realm set seems correct because the data structure being traversed can contain circular references among objects from multiple realms but there is only a single job that runs to completion process those x-realm object references.

@ljharb
Copy link
Member

ljharb commented May 18, 2019

@allenwb in the steps above, what is SameIntrinsicAnyRealm?

@allenwb
Copy link
Member

allenwb commented May 18, 2019

A hypothetical abstract operation that in this case tests whether its first argument is the %ArrayProto_toString% intrinsic of any existing realm of the currently executing ECMAScript Agent.

ljharb added a commit to ljharb/ecma262 that referenced this issue Jul 3, 2019
@cpcallen
Copy link

cpcallen commented Jul 1, 2020

I realise I had neglected to respond to @devsnek. Apologies.

decided to make the [[Seen]] set per-thread rather than global

I think this is a given as the spec assumes (or requires?) that values aren't usable between threads

I was unclear. My use of "thread" here was in respect to one of our highly non-standard extensions, and has nothing to do with threads as introduced with Agents in ES2017. Our threads share a single realm; they are only relevant to this discussion insofar as they offer some (dubiously relevant) insight into the difficulty of distinguishing "completely unrelated" toString calls.

and plan to add a cycle check to Error.prototype.toString as well

was this planned to be brought to committee first? web reality is annoying to deal with retroactively, and I'd argue that throwing makes more sense than anything else.

No: our engine is not used in a browser so is unlikely to influence web reality. I mentioned it here to gauge if there was interest in such a proposal, but I note you are the only who has even referenced the idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
normative change Affects behavior required to correctly evaluate some ECMAScript source text web reality
Projects
None yet
Development

Successfully merging a pull request may close this issue.