-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A.p.join on cyclic arrays does not reflect web reality #289
Comments
I believe this was discussed during ES5 development and at the time Brendan took the position that this was a browser specific hack that should be in the spec (sorry, Brendan, if my recollection is wrong). I suspect that at this point in time TC39 would more likely take the position that we need to specify web reality. Either in the main body of the spec or in Annex B. I good first step would be for somebody to write some spec language that describes what browsers (or at least one browser) actually does. |
It is annoying to spec because it requires plumbing a seen set through ToString :/ |
Actually that way will not work as the seen set has to be plumbed through a toString invocation which is user-visible. I'm not sure how to do this. |
Chakra's implementation is to essentially keep a seen list in the current Realm. I suppose that would work here as well. |
https://gist.github.com/anba/4a154bd7143bf2bab3ef, except the differences between JSC+SM compared to Chakra+V8 when the object is added to the set. |
@anba looks solid at first glance. What is the JSC+SM difference? |
JavaScriptCore and SpiderMonkey check for cycles right after calling ToObject(this): Chakra checks for cycles after calling ToString(separator) and additionally performs extra proxy checks: V8 checks for cycles after calling ToString(separator) and additionally performs extra steps for single-element arrays: (Similar differences are present for Array.prototype.toLocaleString.) |
@allenwb raised some issues in the gist that I will respond to here :) Regarding how realms behave, essentially the realm that join belongs to keeps track of any array objects its seen, whether they are from another realm or the same realm. This means that you will actually join the same array twice if you can manage to get that array to be joined by join functions from different realms. This is actually interoperable behavior! var a = [1,,2];
var $child = $.createRealm();
$child.evalInNewScript('var b = [3,,4]');
var b = $child.global.b;
a[1] = b;
b[1] = a;
print($child.global.Array.prototype.join.apply(a)); spidermonkey1,3,1,,2,4,2 chakra1,3,1,,2,4,2 node1,3,1,,2,4,2 Also, absolutely code can get invoked in the middle, and this is expected. When executing Array.prototype.join, and an element has a custom toString or is a proxy or whatever, and that function causes Array.prototype.join to be applied to an array instance that is already being joined further up the stack, an empty string is returned. This is also expected and interoperable. var a = [1,,2];
a[1] = {
toString() {
return '"' + a.toString() + '"';
}
}
print(a.toString()); d81,"",2 chakra1,"",2 spidermonkey1,"",2 node1,"",2 |
@bterison WRT the xrealm behavior: Did you try it with three references to the x-realm array? I would think that each inner x-realm join call would exit with an empty cycle list in its own realm as it has no way of knowing that it is being called by a toString/join from another realm and hence needs to preserve the list. WRT the second issue (reentrency) the problem is that the a proxy or whatever might might start a completely unrelated toString/join that is completely unrelated to an toString/join that is pending on the stack. If the unrelated toString/join coincidentally contains a reference to sometime in the global cycle list it is poisoned by the pending operation I think we could do something more rational in each of these cases and I double that there is user code that depends upon the current behaviors in cases like these. |
Can you give me an example? I'm not sure I follow.
I can buy that this is a problem in the sense that it seems bad, but I'm not sure of a way out of it without introducing observable extra parameters to toString. It is also how implementations behave today. Seems reasonable to spec this as a starting point at least? (Assuming it works for the case you suggest above). |
One set per realm is equivalent to associating each (instance of the) join method with its own seen list, as a sort of private state of the function object. I think that makes sense, as far as sense goes for this kind of thing. Maybe it can even be specced as a special [[property]] the join function creates and maintains on itself? |
It's primarily the reentrancy issue that I'm concern about. I'm pretty sure that this could be fixed at the specification level (either by adding a an optional second argument to In either case, this has been unspecified for a very long time. I don't see any need to rush to a problematic solution that ignores the reentrancy issue and just copies a poor implementation decision of the past. I suspect we can do better while still maintaining adequate backwards compatibility. |
@allenwb I have hard time to find a plausible scenario where a reentrant call to |
@allenwb I would normally agree, although this was something we implemented because we found the web depended on it. I would support standardizing what is currently required to run the web and we can discuss improvements (eg. to handle any reentrancy problems). What's the downside with this approach? |
This has been open a long time; I've made an attempt in #1518 to address it. Please review, and I'm more than happy to adjust to all feedback. |
Three observations. With respect to web reality:
On the topic of reentrancy, from our experience implementing an eccentric ES dialect supporting multiple (cooperatively multitasked) threads, running code from multiple novice but potentially-mutually-hostile programmers:
Finally, rather tangentially:
|
I think this is a given as the spec assumes (or requires?) that values aren't usable between threads
was this planned to be brought to committee first? web reality is annoying to deal with retroactively, and I'd argue that throwing makes more sense than anything else. |
Are we sure that this Seen Set cannot be used as a global communications channel? |
@erights in the current spec, “that an array is cyclic” is observable by the stack overflow caused by an infinite loop, or by observing repeated calls to your cyclic object’s toString. In the web, it’s observable by knowing where a cycle is, and confirming that join produces a string with a gap where your cyclic object is expected. I’m not sure what would be newly communicated in either case. Could you elaborate on your concerns? |
I'm not worried about observing that an array is cyclic. I could easily write an algorithm to do this in user code anyway. I am also fine with the outcome, that I also do not know how to produce this sensible outcome without a Seen Set somewhere. This concern is: by its nature, the Seen Set is mutable. We are trying to keep hidden mutable state out of the primordials, so that transitively freezing the primordials results in an transitively immutable primordial state (assuming Date.now and Math.random have already been repaired), so that subgraphs which are otherwise isolated but share transitively primordials still cannot communicate. I suspect the Seen Set as used here does not enable such communications. But it would be good to verify this one way or the other. Anytime we introduce hidden global (or per-realm) mutable state, we should worry about this. |
Because this recurs through user code, which might initiate a |
@ljharb I understand that the collection itself is not reified and made available to user code as an object. But it certainly affects observable behavior of user code. If there's one per realm, then for independent |
I have some of the same concerns as @erights. |
@erights yes, reentrancy via toString is one if my concerns. But reentrancy could also occur via Proxies. Perhaps a solution would be to abandon cycle detection if the "array" or any element is a Proxy or if a non-instrinsic toString us encountered |
Hi @allenwb Do proxies raise any concern that a user-defined |
I don't believe Proxies raise any extra concerns, just another path that need to be considered. I've thought about this some more and I'm pretty sure it is possible to specify the algorithm that avoids any reentrancy issues. One way to do it is for the A.p.join algorithm to do an a conditional inlined specialization of the recursive call to I haven't worked through the details but it feels like it should work for data structures built out of normal JS Arrays objects. It doesn't necessarily work for data structures that interweave Arrays and non-Array objects with custom toString methods. I'm not sure what the web-reality algorithm does in such cases (or if there is a single web-reality algorithm). I strongly suspect that breaking the web concerns don't actually show-up for these latter data structures. So, I'm back to where I was in #289 (comment) . We can specify this, at least well enough to avoid meaning web breakage, without using a global data structure. |
Hi Allen, I love the direction, but I'm not sure I understand how such an algorithm would be written. Care to give it a try? Rough pseudo-code would be informative enough. Thanks. |
Roughly The Array join method takes one argument, separator, and performs the following steps:
Abstraction operation ArrayJoin(O, Sep, Seen)
|
Thanks, I'll take a shot at updating #1518 with those spec steps. |
Not that. Consider: const arr = [0, {toString: () => '(' + arr + ')'}, 2];
String(arr); which returns It seems to me that the preconditions to being bitten by this—defining toString methods, and relying on the recursion checks in A.p.join—are all too easily satisfied. |
Wouldn’t that already be the case with the current spec? |
Is @cpcallen's pattern one that has actually been observed on the web? Perhaps, @bterlson knows because of #289 (comment) . Presumably, the goal here is not to eliminate all possible non-terminating behavior starting from A.p.join because that would be the halting problem. EG, [0, {toString() {while(true);}}, 2].join() That said, I'm becoming less concerned about an single agent global circularity set shared by all A.p.join intrinsics. Run-to-completion guarantees that once a call to A.p.join starts it will either complete without preemption or not terminate at all. That means that a single initially empty circularity set should be safe. On entry A.p.join checks if the set is empty, if it is that join invocation is the initiator of a new traversal and hence has the responsibility of ensuring that the set is cleared before terminating (probably via the equivalent of a finally). Reentrant calls to join, independent of the traversal such as from Proxy handlers, would still be swept up into the active circularity set, but that sounds like it is already the implemented behavior. A single global set, rather than a per realm set seems correct because the data structure being traversed can contain circular references among objects from multiple realms but there is only a single job that runs to completion process those x-realm object references. |
@allenwb in the steps above, what is |
A hypothetical abstract operation that in this case tests whether its first argument is the %ArrayProto_toString% intrinsic of any existing realm of the currently executing ECMAScript Agent. |
I realise I had neglected to respond to @devsnek. Apologies.
I was unclear. My use of "thread" here was in respect to one of our highly non-standard extensions, and has nothing to do with threads as introduced with Agents in ES2017. Our threads share a single realm; they are only relevant to this discussion insofar as they offer some (dubiously relevant) insight into the difficulty of distinguishing "completely unrelated"
No: our engine is not used in a browser so is unlikely to influence web reality. I mentioned it here to gauge if there was interest in such a proposal, but I note you are the only who has even referenced the idea. |
Browsers have special measures for dealing with cyclic arrays in the
A.p.join
operator (and by extension,A.p.toString
). Chrome, IE, Firefox, & Safari all behave as follows:That is, they detect the cycle and replace it with the empty string. This is not reflected by the spec, which would require toString to diverge.
Should browsers change, or the spec?
The text was updated successfully, but these errors were encountered: