Description
There are some crates which do something like this (e.g. string_cache_codegen
):
macro_rules! foo {
("a") => (A);
("b") => (B);
("c") => (C);
// ... etc. (maybe hundreds more)
}
(@nnethercote came across this when profiling html5ever
's compilation)
Also, prefixing patterns with @foo
to create "internal rules" is common.
But the current implementation (AFAICT) has to check each of the patterns in turn.
Instead, we could build something like a decision tree ahead of time, from the constant (i.e. $
-less) prefix of all arms, using maps for operators/idents/literals, to make the pattern-matching of the prefix amortize to being linear in the number of input tokens.
We could keep it simple by limiting the constant prefix to leaf tokens (no (...)
, [...]
or {...}
).
For the example above, we'd pre-compute something like this:
DecisionTree::Match {
op: {},
ident: {},
literal: {
"a": DecisionTree::Done([0]),
"b": DecisionTree::Done([1]),
"c": DecisionTree::Done([2]),
// ...
},
}
Whereas for something using @foo
rules it could be:
DecisionTree::Match {
op: {
'@': DecisionTree::Match {
op: {},
ident: {
"foo": DecisionTree::Done([0, 1]),
"bar": DecisionTree::Done([2, 3, 4]),
"baz": DecisionTree::Done([5]),
},
literal: {},
}
},
ident: {},
literal: {},
}
(where DecisionTree::Done
indicates the arms to continue with)
cc @rust-lang/compiler (this might need a design meeting?)
Activity
nnethercote commentedon Feb 5, 2020
In the
html5ever
case, the macro has over 1200 patterns.Mark-Simulacrum commentedon Feb 5, 2020
I imagine the "proper" way to do this today is a proc macro, right? Where you can appropriately use a HashMap, for example, for such a macro?
I'm not sure we should dedicate time to fixing this if proc macros do indeed make things much better here (and are 1-1 compatible, hygiene wise, in the relative short term with MBE hygiene stabilization).
petrochenkov commentedon Feb 5, 2020
cc @mark-i-m
mark-i-m commentedon Feb 5, 2020
I'm personally not in favor of complicating the mbe implementation more. It's already quite hard to understand and reason about, and it has a lot of weird corner cases (i.e. allowing surprising behavior) despite our best efforts. If more time is spent on the mbe implementation, I would say it should be to redo the algorithm from scratch and replace it in the next edition or as after a crater run.
mark-i-m commentedon Feb 5, 2020
To be more concrete, I think we should focus on
decl_macros
, making them less expressive as needed to simplify the implementation (more advanced use cases should use proc-macros). Basically, I think we should be able to fully implement the checks from #61053 without any hacks or heuristics; the checks should be sound, complete, and as early as possible.eddyb commentedon Feb 5, 2020
I think that we can get a lot of mileage out of having a "compilation step" that both checks the definition and transforms it into something more amenable to simple and fast pattern-matching.
I agree that we should probably not bolt on such optimizations to the existing implementation, and instead try to include them as part of a simpler implementation.
mark-i-m commentedon Feb 5, 2020
@eddyb I agree with you in principle, but I guess I can't see how it can be done.
The hard part is that the current implementation (and thus the stable interface) allows all sorts of crazy things that makes it impossible to check almost anything very conclusively, as we discovered in #61053. For example, it is impossible to determine whether a macro_rules defines a new macro_rules without seeing the use site. This in turn means that we cannot have a complete check for whether all metavariables are well-defined at the definition site (since they may be defined at the call-site by a macro generated there).
In my opinion, we need to first RFC major restrictions on MBEs to make these checks possible to implement at all. In particular:
$var
in the body of a macro arm necessarily is a reference to a metavar binding from the macro arm.There are a few other improvements that were identified in #61053 too, which would be good to implement...
Also, cc @ia0 who was involved in implementing those lints...
eddyb commentedon Feb 6, 2020
How can you possibly enforce this when macros can take and interpolate arbitrary token streams?!
I thought you were worried about things like ambiguities or compatibility hazards in the patterns of a macro, not anything to do with what the macro expands to.
With what I've seen in the wild, I don't see how what you're describing is realistically possible, even with an edition (and an edition would still require supporting the older behavior).
mark-i-m commentedon Feb 6, 2020
Admittedly, that one seems like a bit of a stretch, but It seems plausible if we say that you can only use a
tt
metavariable inside a call to a macro. The output of the macro has to be valid rust at some point anyway. Perhaps I'm forgetting something, though?Most of my concerns are about the patterns, but the fact that a macro can expand to arbitrary things restricts what we can assume about it when checking usage of metavariables.
eddyb commentedon Feb 6, 2020
Yeah,
$($foo)*
to interpolate arbitrary Rust syntax.Why do we now care about usage? This is the first time I hear about this, compared to, say, compatibility hazards in terms of an earlier arm succeeding in newer versions.
mark-i-m commentedon Feb 6, 2020
@eddyb I should maybe clarify that when I say "valid rust", I mean syntactically, not semantically (e.g. there may still be type errors etc). Perhaps you can elaborate what you mean? How can this produce arbitrary rust syntax if
$foo
not att
(or else is only used in a macro invocation)? We know what kind of token tree it is otherwise, so it is seems it should not be hard to check that the interpolated fragment at least syntactically fits there.I feel that current macro error messages are... not great. And I think this is largely due to the fact that it is very hard to check the body of a macro at definition time. Rather, we have to wait for it to be instantiated so we can see what the whole thing expands too.
eddyb commentedon Feb 6, 2020
I did mean
$foo
to be att
, yes. For me that makes the whole thing fall apart.This is getting pretty offtopic but I strongly disagree.
In general it's impossible to check the definition site, so I would focus on macro-aware tooling, including diagnostics, operating after the fact. Something direly missing right now is step-by-step expansion visualization with best-effort pretty-printing.
Recent research into resugaring might also be relevant.
nnethercote commentedon Sep 28, 2022
I think this can be closed. I did a lot of work earlier in the year speeding up declarative macro expansion and this pattern (many rules and no metavariables) barely came up at all, because it's so rare. Other than
html5ever
I've never seen it used in real code.Noratrieb commentedon Oct 23, 2022
Before #102895, rustc used to use this pattern as well for the query system, but now it doesn't anymore.