Linting the spec #173

bakkot · 2020-02-24T19:13:25Z

I would like to start enforcing some basic sanity checks for the spec. Some of them will end up implemented in other projects, but I would like to use this issue to track possible lint rules everywhere.

The main goal of this is to reduce editorial churn and correctness issues, but I know that it would also be helpful to people who build tooling based on the spec, like https://github.com/jscert/jsexplain (so cc @brabalan in case you have any ideas).

Currently these are mostly enforced by editors noticing them or @jmdyck submitting after-the-fact PRs, which isn't sustainable.

Here's a few possibilities to get started. Please suggest more.

Output is valid HTML
~~No trailing whitespace~~
In algorithms:
- Consistent spacing, e.g. ~~foo ( a, b ) in algorithm headers and~~ foo(a, b) in algorithm steps.
- Parameter names are surrounded by _
- ~~If an If has substeps, the If line ends with , then~~
- If an If has an Else which has substeps, it is spelled Else, not Otherwise
- If an If has an else on the same line, it is spelled ; else rather than , else or ; else, or ; otherwise (we are inconsistent about else vs otherwise here, and I am OK not enforcing it, but we should at least enforce the semicolon and lack of following space).
- ~~Lines end in ., :, ,, do, or then~~
- ~~Any line not ending in . is followed by an indented sequences of steps.~~
- Consistent casing for variable names - currently we're very bad about this one
- If we say "A and B" or "A, B, and C", the commas are as they are in those examples (rather than "A, and B" or "A, B and C")
- Consistently "let x be" and "set x to" (e.g. not "let x to", "set x be", or "set x to be")
- Ban If _foo_ is present, let _bar_ be _foo_; else let _bar_ be *undefined*. (Editorial: Treat not present parameters as undefined ecma262#1411) [edit: actually that PR only applied to functions, not AOs, so we can only do this if we first sort out the treatment of missing parameters in AO e.g. by banning optional parameters in AOs)
- "... to a new empty List" rather than "to a new List", "to be a new List", "to be a new empty List"
  - Probably something similar for records
- The last step should not be Return., since that's implicit (as of Editorial: describe behavior for algorithms without a "Return" ecma262#2397).
Grammar lookahead restrictions and flags are omitted in early error definitions and syntax-directed operations
In the grammar:
- Lookahead restrictions and flags do not have spaces between brackets (so [lookahead != `let`] or [+Await] rather than [ lookahead != `let` ] or [ +Await ], etc)
- One space before the : on the LHS
- One space between each terminal or nonterminal on the LHS
- Grammar flags are consistent: every ?-prefixed flag on the RHS appears as a parameter flag on the LHS, and every LHS flag is passed down at least one nonterminal in one production on the RHS, or is used to gate a production (see more about what grammar flags mean here)
A bunch of HTML consistency stuff:
- ~~tags are lowercase~~ auto-formatter #367
- ~~attribute values are quoted (or unquoted; I don't care as long as we're consistent)~~
- tags don't have any unexpected attributes (e.g.)
- tags have all the expected attributes (e.g. <emu-xref="foo>" should be caught, since they meant <emu-xref href="foo">)
- for tags for which the closing tag is optional, they are always included (or not included; again, as long as we're consistent)
- ~~≥ is spelled ≥ (etc)~~ formatter: render HTML entities #481
- ~~no unknown emu tags~~ lint: some checks for legal tags/attributes #279
- sec- is only a prefix for an ID when attached to a clause (cf Editorial: Don't use "sec-" prefix for dfn ids ecma262#2103)
- all rows in a table have the same number of cells (account for colspan)
- ~~consistent indentation~~
- emu-grammar and emu-alg tags are not adjacent with others of their kind
- ~~no more than one blank line in a row~~
 - ~~and maybe consistent rules about where blank lines go, at least in some cases: e.g., never between <emu-clause> and <h1>.~~
- Consistent spacing for records: exactly the spacing in { [[key]]: value, [[key2]]: value2 }
Consistent spelling
- ~~the *this* value, not *this* value or the *this* object~~
  - we have a lint for "this object" but not no-"the" "this value" because the latter form is still in use (generally as an argument to an AO call)
- ~~British vs American spelling for words where it's an issue, like "behaviour"~~
- ~~"one's complement" and "two's complement", not "1's complement" or "2's complement"~~
- "uppercase" and "lowercase", not "upper case" or "upper-case" (Editorial: Standardize the spelling of "uppercase" and "lowercase" ecma262#2598)
Annex A ("Grammar Summary") has emu-prodrefs to all productions (or, ideally, is automatically generated)
As of Editorial: use consistent wording for abstract operation preambles ecma262#1914, every abstract operation has a preamble in the correct format (though, what is an abstract operation from the perspective of ecmarkup? - I guess an emu-clause with an AOID is a reasonable heuristic.)
When the steps or prose for a syntax-directed operation refer to the name of a nonterminal, it is surrounded with |, as in |UnicodeLeadSurrogate|.
In syntax-directed operations,
- All referenced nonterminals occur in the production for the SDO.
- In the algorithm steps or prose, opt subscripts and grammar parameters are not included.
Miscellaneous stuff:
- ~~Always *+0*𝔽 or *-0*𝔽, never *0*𝔽. ~~(add some lint rules for numbers #257)
- "be the Record {", not "be a new Record {"
- Never *+1* for any string of digits except 0.
- ~~Exactly one space between sentences.~~
-  is not followed by a linebreak, and  is not preceded by a linebreak (even with intervening whitespace).
- ~~[Cc]lauses? \d should be forbidden.~~
- An inline if does not have a then: If foo, return false. not If foo, then return false.
- "ECMAScript language value", not "ECMAScript value"
No namespace collisions between constants and AOs (and maybe other namespaces)
All AOs have structured headers (once people have had time to get used to the new syntax)
No unnecessary explicit suppressions / annotations for can-call-user-code
Every algorithm returns.
consistently "a number or a bigint", in that order, in types: Editorial: Use consistent phrasing for parameters that are Number or BigInt ecma262#2622
~~no unused Let bindings or captured variables in AOs: Editorial: remove unused capture in %TypedArray%.prototype.sort ecma262#2836~~ lint for used-but-not-declared and declared-but-not-used variables in algorithms #483
If _x_ is *null*, return *null* rather than If _x_ is *null*, return _x_, per Editorial: Prefer returning static values after alias comparison ecma262#3122 - anywhere there's a comparison against a thing in *s or ~s, or a literal number or +/-∞, and then the alias is returned, we should return the constant instead.

It would also be nice to have a few more static-analysis-y checks, like

The grammar should be unambiguous
- And LR(k)
Typechecking for abstract operations
- All used operations are defined
- ~~They are invoked with the right number of arguments~~
- ~~They don't reference any values which are not defined~~ lint for used-but-not-declared and declared-but-not-used variables in algorithms #483
  - And when updating an already defined value, this is done with set rather than let, increase, increment, add, etc.
- ~~Their return value is treated as a completion record or not as a completion record, as appropriate (pending Sorting out completion records ecma262#1796)~~
- In an ideal world, actual typechecking for values
  - Given that, enforce the * vs ~ vs _ vs " rules for referring to different kinds of values
  - As a particular case, algorithms should say If _x_ is *true*, not If _x_ is true.
~~Grammar productions have all and only the appropriate flags~~
~~All syntax-directed operations correspond to actual productions~~

Edit: some of these are done in #199, #205, #207, #209, #210, and #239. I've struck them from the above list. I'm keeping this issue open to track remaining items.

The text was updated successfully, but these errors were encountered:

syg · 2020-02-24T19:26:19Z

All used operations are defined

This would need to be amended for AOs that are only called upstream, but at least it would force us to enumerate such AOs.

michaelficarra · 2020-02-24T20:22:38Z

As an additional step here, I would like for us to define a bunch of "pattern matchers" of some sort that identify idiomatic spec phrasing, and use a special rendering of the spec that highlights/styles these idiomatic sections of text so that we can identify algorithm steps and whatnot that (unexpectedly) do not conform. We might even want to track/relate the idiomatic phrasing in the same way we do AO references.

jmdyck · 2020-02-25T01:49:06Z

I'll point out that ecmaspeak-py does almost all of what @bakkot listed, and more. Not that that helps ecmarkup a whole lot, but you're welcome to adapt my code.

michaelficarra · 2020-02-25T03:10:50Z

@jmdyck Can you expand on the "and more"? What other things does it do that aren't listed in @bakkot's comment?

edit: I see you posted a link. I'll check out the code.

jmdyck · 2020-02-25T04:30:01Z

In addition to the things @bakkot listed, ...

analyze_spec.py checks that...

Only expected elements are used, and that their content conforms to expected content models.
re 'id' attributes:
- There are no duplicates.
- They conform to certain expectations (e.g., the 'id' of an emu-table should start with "table-").
- A name isn't both an 'id' and an 'oldid'.
- There's a defining 'id' for every id-reference.
- Defined ids (with many exceptions) are referenced somewhere in the spec.
Some of the above similarly for 'aoid' attributes.
emu-tables with certain kinds of caption have expected column-heads.
The content of the well-known intrinsics table conforms to certain expectations.
Each %Foo% in the spec (outside the intrinsics table) resolves to one in the intrinsics table.

Section.py checks that...

clause-headers conform to certain expectations.
Within certain clauses (e.g., the properties of object Foo), sub-clauses are in 'alphabetical order'.

emu_grammars.py checks that...

Every emu-grammar element is syntactically well-formed.
Every production in a 'definition' emu-grammar "makes sense" in various ways, especially in its use of grammatical parameters.
The total set of defining productions makes sense as a whole.
Every production in a non-definition emu-grammar is a version of some defining production. (In general, it needn't be an exact copy of one, for various reasons.)
Outside of emu-grammar elements, references to nonterminals are well-defined.
(It also generates a good approximation to Annex A.)
(It also has in-progress code to generate a parser.)

Pseudocode.py ...

Attempts to parse every chunk of pseudocode (emu-alg, early error, emu-eqn, inline SDO definition), and complains about any failures. (See *.grammar for the grammars it uses.)
(Builds data structures for use in static analysis.)
Builds a static call graph and looks for anomalies.
Attempts to check SDO coverage (See Normative: plug some holes in the coverage of syntax-directed operations ecma262#1301)

static_type_analysis.py...

Extracts parameter info + return type info from preambles (and complains about any anomalies it sees).
Attempts to perform an interprocedural static type analysis of all the pseudocode (and complains about any type errors it detects).
Generates a version of the spec with 'algorithm headers' (See Replace each abstract op's preamble with a structured 'header'. ecma262#545)

michaelficarra · 2020-03-30T21:19:08Z

@bakkot

Grammar lookahead restrictions and flags are omitted in early error definitions and syntax-directed operations

I think you mean included, right?

bakkot · 2020-03-30T21:22:05Z

If we decide they should be universally included, sure.

michaelficarra · 2020-03-30T21:25:01Z

@bakkot I thought that's what we decided in the editor call. We even have it on our editor update slides.

bakkot · 2020-03-30T21:26:33Z

I only remembered that we were vaguely in support of it, not that we necessarily intended to actually change it. But if we did, sure, sounds good.

michaelficarra · 2020-03-30T21:28:42Z

I'm not good at remembering those sorts of things without notes. We can confirm at the next call.

edit: In a later editor call, we did in fact confirm that universal inclusion is desired.

rbuckton · 2020-04-03T00:51:41Z

emu_grammars.py checks that...

Every emu-grammar element is syntactically well-formed.

Every production in a 'definition' emu-grammar "makes sense" in various ways, especially in its use of grammatical parameters.

The total set of defining productions makes sense as a whole.

Every production in a non-definition emu-grammar is a version of some defining production. (In general, it needn't be an exact copy of one, for various reasons.)

Outside of emu-grammar elements, references to nonterminals are well-defined.

(It also generates a good approximation to Annex A.)

(It also has in-progress code to generate a parser.)

grammarkdown already does a lot of this, ecmarkup just doesn't employ those features.

rbuckton · 2020-04-03T00:53:45Z

Feel free to file bugs against grammarkdown for anything you think needs to be added to the parser, checking, etc.: https://github.com/rbuckton/grammarkdown

michaelficarra · 2020-04-06T22:40:23Z

Nesting more than one AO invocation as parameters to another AO should be disallowed. See tc39/ecma262#1573 (comment)

ljharb · 2020-04-06T23:55:44Z

To confirm, it seems like they should be disallowed only because the current macro doesn't handle them, not because they're inherently something we don't want?

bakkot · 2020-04-07T00:05:43Z

The issue there isn't about the macro, it's the fact that there's no evaluation order for the arguments defined, and when they're abstract operations they can have side effects which must be specified to happen in a specific order.

michaelficarra · 2020-04-20T16:40:51Z

In algorithm steps, all aliases should be introduced using Let or with a parameter list before they are used. (see Editorial: Fixed typos in variable name oldvalue ecma262#1954)
Forbid introduction of aliases that are never referenced.

michaelficarra · 2020-04-21T18:51:53Z

In algorithm steps, each alias should only be introduced with Let at most once for any possible trace through the algorithm.

bakkot · 2020-05-10T04:39:04Z

In algorithm steps, all aliases should be introduced using Let or with a parameter list before they are used.

There are other ways we introduce aliases. Other than those Let and parameter lists (including parameter lists of abstract closures), we also have various permutations on:

For each _x_ (For each element _k_ of _keys_, For each integer _k_ that satisfies ..., etc)
Evaluate |Foo| to obtain _x_, which is used only in regexes and is complicated by the fact that some of the regex operations return multiple results
If there exists _x_ (e.g.)
the smallest possible integer _k_ not smaller than (e.g.)
For yield we say something like when evaluation is resumed with a Completion _resumptionValue_ the following steps will be performed, which introduces _resumptionValue_ for use in those steps
For MakeDay we say Find a value _t_ such that

Also, we occasionally refer to aliases from other algorithms. For example, the Function constructor says Let _args_ be the _argumentsList_ that was passed to this function by [[Call]].

Also also, some places we have algorithms (especially but not uniquely in Annex B, e.g. in the Note at the end of Number.prototype.toExponential) we're actually defining replacements to existing steps in to other algorithms, in which case it is not possible to locally determine which aliases are in scope.

bakkot · 2020-06-14T17:06:29Z

Also, the things which are currently console.log warnings should become linting errors, ideally.

Edit: done in #229.

bakkot · 2020-07-14T22:38:04Z

We should figure out if we want to use html entities for non-ASCII stuff, and enforce that decision. Currently we use both « and «.

ljharb · 2020-07-14T22:39:13Z

Since I'm on a Mac, it's easy to type the actual characters, so I'd prefer that; but if others editing the spec aren't, it might not be as easy.

michaelficarra · 2020-07-14T23:27:44Z

It's 2020, embrace the Unicode.

NotWearingPants · 2021-05-06T20:10:09Z

We should figure out if we want to use html entities for non-ASCII stuff, and enforce that decision.

Why not use ASCII quotes and convert automatically in build time? Here is how markdown-it does this.

If an If has an else on the same line, it is spelled ; else rather than , else or ; else, or ; otherwise

Instead of enforcing consistency, how about adding tags for this, e.g.

1. <emu-if><cond>x is 1 and y is 1</cond><then>return 1</then></emu-if>
1. <emu-else>return 0</emu-else>

and

1. <emu-if-multiline><cond>_exponent_ is *+&infin;*<sub>𝔽</sub></cond><then>
  1. <emu-if><cond>abs(ℝ(_base_)) &gt; 1</cond><then>return *+&infin;*<sub>𝔽</sub></then></emu-if>
  1. <emu-if><cond>abs(ℝ(_base_)) is 1</cond><then>return *NaN*</then></emu-if>
  1. <emu-if><cond>abs(ℝ(_base_)) &lt; 1</cond><then>return *+0*<sub>𝔽</sub></then></emu-if>
</then></emu-if-multiline>
1. <emu-else-multiline>...</emu-else-multiline>

This would allow enforcing a consistent style, would also allow authors to not have to remember where commas/semicolons go, and would also let you change the style across the entire document anytime (after all ifs have been converted of course).

jmdyck · 2021-05-06T22:36:44Z

Instead of enforcing consistency, how about adding tags for this

The ecmarkdown page says:

Some of Ecmarkdown's biggest benefits are when using it to write algorithm steps, without the many formalities HTML requires for list items and inline formatting of common algorithmic constructs.

Adding tags as you suggest would make algorithms harder to write + edit (and read, in source).

michaelficarra · 2024-08-28T20:39:33Z

When For each iterates over the elements of a List, we say "of", not "in".

tc39/ecma262#3382 (comment)

bakkot mentioned this issue Mar 27, 2020

Editorial: use consistent wording for abstract operation preambles tc39/ecma262#1914

Merged

michaelficarra mentioned this issue Apr 2, 2020

Editorial: delete some SDO cases that are covered by the chain production rule tc39/ecma262#1933

Merged

bakkot mentioned this issue May 9, 2020

basic linting behind --lint-spec #199

Merged

bakkot mentioned this issue May 16, 2020

Meta: bump ecmarkup, enforce linting tc39/ecma262#1995

Merged

bakkot changed the title ~~Support linting the spec~~ Linting the spec May 16, 2020

bakkot mentioned this issue May 25, 2020

Normative: Add String.prototype.replaceAll tc39/ecma262#2009

Merged

bakkot mentioned this issue Jun 16, 2020

Editorial: use step labels rather than numbers to refer to steps tc39/ecma262#2052

Merged

bakkot mentioned this issue Aug 20, 2020

Editorial: Replace non-ASCII quotes tc39/ecma262#2149

Merged

bakkot mentioned this issue Nov 29, 2020

Documenting editorial conventions tc39/ecma262#2236

Open

bakkot mentioned this issue Jan 12, 2021

~constants~ should not autolink #285

Merged

bakkot mentioned this issue Mar 1, 2021

Fix undefined / unused variables in spec text. tc39/proposal-temporal#1403

Merged

michaelficarra mentioned this issue Aug 16, 2022

Editorial: clarify string representation of numbers with radix ≠ 10 tc39/ecma262#2854

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linting the spec #173

Linting the spec #173

bakkot commented Feb 24, 2020 •

edited

Loading

syg commented Feb 24, 2020

michaelficarra commented Feb 24, 2020

jmdyck commented Feb 25, 2020

michaelficarra commented Feb 25, 2020 •

edited

Loading

jmdyck commented Feb 25, 2020 •

edited

Loading

michaelficarra commented Mar 30, 2020

bakkot commented Mar 30, 2020

michaelficarra commented Mar 30, 2020

bakkot commented Mar 30, 2020

michaelficarra commented Mar 30, 2020 •

edited

Loading

rbuckton commented Apr 3, 2020

rbuckton commented Apr 3, 2020

michaelficarra commented Apr 6, 2020

ljharb commented Apr 6, 2020

bakkot commented Apr 7, 2020

michaelficarra commented Apr 20, 2020

michaelficarra commented Apr 21, 2020

bakkot commented May 10, 2020

bakkot commented Jun 14, 2020 •

edited

Loading

bakkot commented Jul 14, 2020

ljharb commented Jul 14, 2020

michaelficarra commented Jul 14, 2020

NotWearingPants commented May 6, 2021

jmdyck commented May 6, 2021

michaelficarra commented Aug 28, 2024

Linting the spec #173

Linting the spec #173

Comments

bakkot commented Feb 24, 2020 • edited Loading

syg commented Feb 24, 2020

michaelficarra commented Feb 24, 2020

jmdyck commented Feb 25, 2020

michaelficarra commented Feb 25, 2020 • edited Loading

jmdyck commented Feb 25, 2020 • edited Loading

michaelficarra commented Mar 30, 2020

bakkot commented Mar 30, 2020

michaelficarra commented Mar 30, 2020

bakkot commented Mar 30, 2020

michaelficarra commented Mar 30, 2020 • edited Loading

rbuckton commented Apr 3, 2020

rbuckton commented Apr 3, 2020

michaelficarra commented Apr 6, 2020

ljharb commented Apr 6, 2020

bakkot commented Apr 7, 2020

michaelficarra commented Apr 20, 2020

michaelficarra commented Apr 21, 2020

bakkot commented May 10, 2020

bakkot commented Jun 14, 2020 • edited Loading

bakkot commented Jul 14, 2020

ljharb commented Jul 14, 2020

michaelficarra commented Jul 14, 2020

NotWearingPants commented May 6, 2021

jmdyck commented May 6, 2021

michaelficarra commented Aug 28, 2024

bakkot commented Feb 24, 2020 •

edited

Loading

michaelficarra commented Feb 25, 2020 •

edited

Loading

jmdyck commented Feb 25, 2020 •

edited

Loading

michaelficarra commented Mar 30, 2020 •

edited

Loading

bakkot commented Jun 14, 2020 •

edited

Loading