-
Notifications
You must be signed in to change notification settings - Fork 15
Description
If Rue wants to add generic type parameter syntax, but wants a lighter, more functional feel, Rue might want to depart from strict adherence to Rust lexical conventions to avoid the turbo-fish problem: lists of type parameters are valid albeit rarely used in an expression context.
tldr; resolving < ambiguity based on a purely lexical convention might better allow for later addition of flexible meta-programming features and allow for a lighter-weight functional style of programming.
If a lexical convention like 3 or 4 below is desired, it might be wise to adopt before there is a large installed base of Rue code since changes to lexical conventions are hard to manage unless the user-base is willing to run a reformatter à la go fix.
Type parameter lists don't always appear in obvious places
In a functional language, it can be good to allow partial application of type parameters.
It's also nice for meta-programming if macro calls can take types as arguments if they were expressions; if Zig catches on, more devs might be familiar with this.
Consider these scenarios. The first few can easily be handled by having different productions for Type from Expr but the last few are ambiguous.
let x: A<B> = a < b;
// ^^^^ ^^^^^
// TYPE EXPR
// A reference to a function that is not applied,
// but is specialized from 'a->'a to bool->bool.
let x: Predicate<bool> = identity<bool>;
// In a functional-feeling language, it's nice to have the flexibility to partially apply type parameters
// Passing types as parameters to a macro.
myMacro!(A<B, C>)
// Passing type lists to a macro; maybe it picks a base type to parameterize based on the size
// of the type list.
anotherMacro!(<B, C>)
// If macro application isn't syntactically distinguished from function application, then
// a library can evolve backwards compatibly by replacing a function with a macro that does the
// same thing by expanding to a call to a less-safe helper.
mightBeAMacro(A<B, C>)
probablyNotAMacro(a<b, c)Some languages allow for parsing types in expression contexts via an explicit prefix. See keyword typename in C++. When typename is and isn't required is a source of confusion for devs new to C++ template metaprogramming.
< is ambiguous
In languages like Java, C++, and Rust, < can be a comparison operator, or it can be an open bracket.
f(a < b, c > d)
// ^ infix operator; value named by variable a compared to that by b
let f:a < b, c >=d;
// ^ open bracket; type 'a' specialized with actual type parameters b and cThis difference can be resolved at parse time; only in a Type grammar production, is < an open bracket.
Unfortunately, if the language wants to extend its lexical grammar to allow, for example string interpolation, then it means that the language is committing to a scannerless parser.
"Hello${
{} // just a block. The '}' there does not close the interpolation
let world: Message< ... > = f();
g(world)
}!"Complex, but still context-free lexical constructs light string interpolations require answering the question: "should the next character be treated as string content because this } character closes a string interpolation."
If you're already keeping a bracket stack for ${...} pairs, then the same machinery can be used to handle a <...> pair.
Having matching of > close brackets with < possible open brackets be based on a purely lexical conventions leads to fewer potential bad interactions with other complex lexical conventions like ${ and } matching.
If the < ... > above could contain mismatched }s, as during code editing inside an IDE, then knowing that a } is inside a bracket pair is necessary for correctly recognizing the } that
Pairing > affects tokenization
let x: A<B<C<D>>>= y;
// ^^^^ Here, `>>>=` is four distinct tokens.
d >>>= y;
// ^^^ Here, `>>>=` is one compound right shift assignment tokenSo angle bracket identification is often going to be intertwined with finding lexical boundaries.
Any purely lexical convention is going to open up room for confusion.
As hinted above, any parser based solution to < ambiguity is going to open up room for user confusion.
// In the absence of a comma operator, the below is an application of type-specialized function 'a'
// to type actual list [b, c], and to expression actual list [d]
a<b, c>(d) ;
// Application of f to the expression actual list [(a < b), (c > (d))]
f(a<b, c>(d));Any purely lexical convention is too. Below are 4 possible conventions, each with a discussion of their shortcomings.
Lexical convention 1 -- naïve pairing
Any > that follows a < at the same nesting level, indicates that the matching < is an open bracket and that the current > is its mate.
Counterexample: if a < b && c > d {
Some lower precedence infix operators (&&, ||, and ^) naturally nest comparison operators.
Workaround, parenthesize < operators: if (a < b) && c > d {
This happens frequently enough that it probably causes lots of pain to developers, especially adopters of lexically similar languages that don't require this.
Lexical convention 2 -- case sensitivity
If a < is preceded by an upper-case identifier then it is an open bracket, but otherwise it is not.
A<B, C>(d) has brackets.
a<b, c>(d) does not.
Lexical convention 2 should be rejected so as to guarantee a good DX for all developers including unicameral writing system users.
Though in many programming languages, by convention, identifiers for types are upper-case, and identifiers for value-holding cells are lower-case, most human writing systems are unicameral: they do not have a case distinction.
Golang uses case to indicate publicness for identifiers which poorly served written-Chinese users: golang/go#5763
Lexical convention 3 -- space sensitivity
Infix operators are almost always surrounded by spaces.
A < that is preceded by an ignorable token that forces a token boundary (like spaces or /*...*/ comment) is an infix operator. Otherwise it is a bracket.
A<B, C>
// Brackets. A is not separated from `<` by a space or comment token
struct Foo<A : ...>
// Brackets. A is not separated from `<` by ...
struct Foo<
/*! Documentation for type formal A */
A : ...
>...
// Brackets, long type formal lists can still be spread across multiple lines
a < b
// an expression. there is a space between 'a' and `<`This convention is widely used by programmers.
a - -b
// Equivalent to (a - (-b))
a-- - --b
// Equivalent to ((a--) - (--b))
Scala with its open operator set has already relied on this convention of spaces required around infix operators and it has not proven a barrier to adoption.
Swift has also adopted whitespace conventions without huge controversy though slightly different. https://docs.swift.org/swift-book/documentation/the-swift-programming-language/lexicalstructure/#Operators
If an operator has whitespace around both sides or around neither side, it’s treated as an infix operator. As an example, the
+++operator ina+++banda +++ bis treated as an infix operator.
This seems low risk, and is easy two explain but there are two failure modes. In either case, the solution is to add/remove the offending token. In the case of the under-classifying comment below, that is non-trivial.
Overclassification:
x< yIf x is previously declared as a variable, not a type, an IDE plugin can probably suggest the space.
This may not work in two cases:
// Reordered declaration
pi< tau;
// These might never be parsed and typed by the IDE plugin because they are seen as inside an unclosed bracket block
const pi = 3.14;
const tau = pi * 2;
////// ------
// In a supposed language where const variable declaration syntax is used to
// alias types for meta-programming purposes, an IDE plugin might not be
// able to recognize over-classification in some cases
const MyTypeAlias = MyType;
MyTypeAlias<: SomeSuperTypeUnderclassification
It's possible to underclassify brackets too.
MyStruct <TypeParam>
MyStruct<TypeParam>
// ^ Zero width non-joiner here.
struct MyStruct/*TODO: rename to MonStruct*/<TypeParam : ...>If it's illegal for invisible characters like zero-width joiners and non-joiners to appear except in the middle of identifiers or string or comment tokens, then there's less risk of hard-to-debug under-classification.
A strong lint warning for all comments immediately preceding < tokens would not be amiss.
Lexical convention 4 -- if a < has a matching > it's a bracket unless it obviously shouldn't be.
Space sensitivity is simple, but if space insensitivity is a hard requirement, e.g. so as to simplify tooling that generates or reformats source code, this more complex convention may help.
Start with a list of tokens that specify operators that distribute over comparison:
| Token | Why |
|---|---|
|| |
Lower precedence operator often applied to comparisons |
&& |
" |
^ |
" |
; |
Indicates statement boundary |
Then define the convention:
- A
<token starts assumed to be infix, but is nevertheless entered on a bracket stack. - If any close bracket is found that potentially closes a wider bracket group then its stack entry is popped.
(a < b)has a bracket stack consisting of["(", "<"]after<is processed, but when)is seen both are popped. - If a
>token is seen then any topmost<is reclassified as a bracket and its stack entry is popped. If such an entry is popped, then the>is classified as a close bracket token, otherwise it is classified as an infixer. - If a token on the list above is seen (
||et al) then the token stack is popped until there is no top-most<entry.
For example:
a < b && c > (d)
// No bracket because token stack is ['<' ASSUMED_INFIX] when `&&` is seen but is empty by (4) above after it
a < b, c > (d)
// There is a bracket because none of the intervening tokens ('b', ',', 'c') interrupt matching `>` with `<`, so `<` is reclassified as a bracket by (3) when '>' is processed.
for (;
a < b;
c > d && break
) { ... }
// Both uses of infix operators. The second ';' token pops `<` from the bracket stack.
a < b, (c && d) >
// Weird. But potentially useful if you want constant expressions to be usable as type actuals.
// Since the `&&` is processed when there is a `(` on the bracket stack, it does not cancel the `<`,
// but since the parentheses are off the stack when '>' is reached, the angle brackets mate.