design space: `<...>` for type parameters

If Rue wants to add generic type parameter syntax, but wants a lighter, more functional feel, Rue might want to depart from strict adherence to Rust lexical conventions to avoid the turbo-fish problem: lists of type parameters are valid albeit rarely used in an expression context.

tldr; resolving `<` ambiguity based on a purely lexical convention might better allow for later addition of flexible meta-programming features and allow for a lighter-weight functional style of programming.

If a lexical convention like 3 or 4 below is desired, it might be wise to adopt before there is a large installed base of Rue code since changes to lexical conventions are hard to manage unless the user-base is willing to run a reformatter à la `go fix`.

## Type parameter lists don't always appear in obvious places

In a functional language, it can be good to allow partial application of type parameters.
It's also nice for meta-programming if macro calls can take types as arguments if they were expressions; if Zig catches on, more devs might be familiar with this.

Consider these scenarios.  The first few can easily be handled by having different productions for *Type* from *Expr* but the last few are ambiguous.

```ts
let x: A<B> = a < b;
//     ^^^^   ^^^^^
//     TYPE   EXPR


// A reference to a function that is not applied, 
// but is specialized from 'a->'a to bool->bool.
let x: Predicate<bool> = identity<bool>;
// In a functional-feeling language, it's nice to have the flexibility to partially apply type parameters


// Passing types as parameters to a macro.
myMacro!(A<B, C>)


// Passing type lists to a macro; maybe it picks a base type to parameterize based on the size
// of the type list.
anotherMacro!(<B, C>)

// If macro application isn't syntactically distinguished from function application, then
// a library can evolve backwards compatibly by replacing a function with a macro that does the
// same thing by expanding to a call to a less-safe helper.
mightBeAMacro(A<B, C>)
probablyNotAMacro(a<b, c)
```

Some languages allow for parsing types in expression contexts via an explicit prefix.  See keyword `typename` in C++.  When `typename` is and isn't required is a source of confusion for devs new to C++ template metaprogramming.


## `<` is ambiguous

In languages like Java, C++, and Rust, `<` can be a comparison operator, or it can be an open bracket.

```rust
    f(a < b, c > d)
//      ^ infix operator; value named by variable a compared to that by b

let f:a < b, c >=d;
//      ^ open bracket; type 'a' specialized with actual type parameters b and c
```

This difference can be resolved at parse time; only in a *Type* grammar production, is `<` an open bracket.
Unfortunately, if the language wants to extend its lexical grammar to allow, for example string interpolation, then it means that the language is committing to a scannerless parser.

```ts
"Hello${
  {}  // just a block.  The '}' there does not close the interpolation
  let world: Message<  ...  > = f();
  g(world)
}!"
```

Complex, but still context-free lexical constructs light string interpolations require answering the question: "should the next character be treated as string content because this `}` character closes a string interpolation."

If you're already keeping a bracket stack for `${...}` pairs, then the same machinery can be used to handle a `<...>` pair.

Having matching of `>` close brackets with `<` possible open brackets be based on a purely lexical conventions leads to fewer potential bad interactions with other complex lexical conventions like `${` and `}` matching.

If the `< ... >` above could contain mismatched `}`s, as during code editing inside an IDE, then knowing that a `}` is inside a bracket pair is necessary for correctly recognizing the `}` that 

## Pairing `>` affects tokenization

```rust
let x: A<B<C<D>>>= y;
//            ^^^^ Here, `>>>=` is four distinct tokens.

d >>>= y;
// ^^^ Here, `>>>=` is one compound right shift assignment token
```

So angle bracket identification is often going to be intertwined with finding lexical boundaries.

## Any purely lexical convention is going to open up room for confusion.

As hinted above, any parser based solution to `<` ambiguity is going to open up room for user confusion.

```rust
// In the absence of a comma operator, the below is an application of type-specialized function 'a'
// to type actual list [b, c], and to expression actual list [d]
  a<b, c>(d) ;

// Application of f to the expression actual list [(a < b), (c > (d))]
f(a<b, c>(d));
```

Any purely lexical convention is too.  Below are 4 possible conventions, each with a discussion of their shortcomings.

### Lexical convention 1 -- naïve pairing

Any `>` that follows a `<` at the same nesting level, indicates that the matching `<` is an open bracket and that the current `>` is its mate.

Counterexample: `if a < b && c > d {`

Some lower precedence infix operators (`&&`, `||`, and `^`) naturally nest comparison operators.

Workaround, parenthesize `<` operators: `if (a < b) && c > d {`

This happens frequently enough that it probably causes lots of pain to developers, especially adopters of lexically similar languages that don't require this.

### Lexical convention 2 -- case sensitivity

If a `<` is preceded by an upper-case identifier then it is an open bracket, but otherwise it is not.

`A<B, C>(d)` has brackets.
`a<b, c>(d)` does not.

Lexical convention 2 should be rejected so as to guarantee a good DX for all developers including unicameral writing system users.

Though in many programming languages, by convention, identifiers for types are upper-case, and identifiers for value-holding cells are lower-case, most human writing systems are unicameral: they do not have a case distinction.

Golang uses case to indicate publicness for identifiers which poorly served written-Chinese users: https://github.com/golang/go/issues/5763

### Lexical convention 3 -- space sensitivity

Infix operators are almost always surrounded by spaces.

A `<` that is preceded by an ignorable token that forces a token boundary (like spaces or `/*...*/` comment) is an infix operator.  Otherwise it is a bracket. 

```rust
A<B, C>
// Brackets.  A is not separated from `<` by a space or comment token

struct Foo<A : ...>
// Brackets.  A is not separated from `<` by ...

struct Foo<
  /*! Documentation for type formal A */
  A : ...
>...
// Brackets, long type formal lists can still be spread across multiple lines

a < b
// an expression. there is a space between 'a' and `<`
```

This convention is widely used by programmers.

    a - -b
    // Equivalent to (a - (-b))

    a-- - --b 
    // Equivalent to ((a--) - (--b))

Scala with its open operator set has already relied on this convention of spaces required around infix operators and it has not proven a barrier to adoption.

Swift has also adopted whitespace conventions without huge controversy though slightly different.  https://docs.swift.org/swift-book/documentation/the-swift-programming-language/lexicalstructure/#Operators

> If an operator has whitespace around both sides or around neither side, it’s treated as an infix operator. As an example, the `+++` operator in `a+++b` and `a +++ b` is treated as an infix operator.

This seems low risk, and is easy two explain but there are two failure modes.  In either case, the solution is to add/remove the offending token.  In the case of the under-classifying comment below, that is non-trivial.

#### Overclassification:

```rust
x< y
```

If `x` is previously declared as a variable, not a type, an IDE plugin can probably suggest the space.
This may not work in two cases:

```rust
// Reordered declaration
pi< tau;

// These might never be parsed and typed by the IDE plugin because they are seen as inside an unclosed bracket block
const pi = 3.14;
const tau = pi * 2;

////// ------
// In a supposed language where const variable declaration syntax is used to
// alias types for meta-programming purposes, an IDE plugin might not be
// able to recognize over-classification in some cases
const MyTypeAlias = MyType;

MyTypeAlias<: SomeSuperType
```

#### Underclassification

It's possible to underclassify brackets too.

```rust
MyStruct <TypeParam>

MyStruct<TypeParam>
//      ^ Zero width non-joiner here.

struct MyStruct/*TODO: rename to MonStruct*/<TypeParam : ...>
```

If it's illegal for invisible characters like zero-width joiners and non-joiners to appear except in the middle of identifiers or string or comment tokens, then there's less risk of hard-to-debug under-classification.

A strong lint warning for all comments immediately preceding `<` tokens would not be amiss.

### Lexical convention 4 -- if a `<` has a matching `>` it's a bracket unless it obviously shouldn't be.

Space sensitivity is simple, but if space insensitivity is a hard requirement, e.g. so as to simplify tooling that generates or reformats source code, this more complex convention may help.

Start with a list of tokens that specify operators that distribute over comparison:

| Token | Why |
| ------ | ---- |
| <code>&#124;&#124;</code> | Lower precedence operator often applied to comparisons |
| `&&` | \" |
| `^` | \" |
| `;` | Indicates statement boundary |

Then define the convention:

1. A `<` token starts assumed to be infix, but is nevertheless entered on a bracket stack.
2. If any close bracket is found that potentially closes a wider bracket group then its stack entry is popped.  `(a < b)` has a bracket stack consisting of `["(", "<"]` after `<` is processed, but when `)` is seen both are popped.
3. If a `>` token is seen then any topmost `<` is reclassified as a bracket and its stack entry is popped.  If such an entry is popped, then the `>` is classified as a close bracket token, otherwise it is classified as an infixer.
4. If a token on the list above is seen (`||` et al) then the token stack is popped until there is no top-most `<` entry.

For example:

```rust
a < b && c > (d)
// No bracket because token stack is ['<' ASSUMED_INFIX] when `&&` is seen but is empty by (4) above after it

a < b, c > (d)
// There is a bracket because none of the intervening tokens ('b', ',', 'c') interrupt matching `>` with `<`, so `<` is reclassified as a bracket by (3) when '>' is processed.

for (;
  a < b;
  c > d && break
) { ... }
// Both uses of infix operators.  The second ';' token pops `<` from the bracket stack.

a < b, (c && d) >
// Weird.  But potentially useful if you want constant expressions to be usable as type actuals.
// Since the `&&` is processed when there is a `(` on the bracket stack, it does not cancel the `<`,
// but since the parentheses are off the stack when '>' is reached, the angle brackets mate.
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

design space: `<...>` for type parameters #273

Type parameter lists don't always appear in obvious places

`<` is ambiguous

Pairing `>` affects tokenization

Any purely lexical convention is going to open up room for confusion.

Lexical convention 1 -- naïve pairing

Lexical convention 2 -- case sensitivity

Lexical convention 3 -- space sensitivity

Overclassification:

Underclassification

Lexical convention 4 -- if a `<` has a matching `>` it's a bracket unless it obviously shouldn't be.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Token	Why
`\|\|`	Lower precedence operator often applied to comparisons
`&&`	"
`^`	"
`;`	Indicates statement boundary

design space: <...> for type parameters #273

Description

Type parameter lists don't always appear in obvious places

< is ambiguous

Pairing > affects tokenization

Any purely lexical convention is going to open up room for confusion.

Lexical convention 1 -- naïve pairing

Lexical convention 2 -- case sensitivity

Lexical convention 3 -- space sensitivity

Overclassification:

Underclassification

Lexical convention 4 -- if a < has a matching > it's a bracket unless it obviously shouldn't be.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

design space: `<...>` for type parameters #273

`<` is ambiguous

Pairing `>` affects tokenization

Lexical convention 4 -- if a `<` has a matching `>` it's a bracket unless it obviously shouldn't be.