Description
When we serialize a Span
to crate metadata, we currently throw away the SyntaxContext
:
Lines 631 to 650 in 34700c1
This is because the backing HygieneData
is stored in a thread-local in rustc_span
, and not serialized into crate metadata.
The result is that spans deserialized from crate metadata may have less information available than spans from the current crate. If the MIR inlining pass decides to inline a function from another crate, we may end up with suboptimal messages when we invoke span.ctxt()
(e.g. when emitting debuginfo, and when evaluating the caller_location
intrinsic).
It would be useful if we were to serialize HygieneData
into crate metadata, and deserialize spans with the proper SyntaxContext
. This will also ensure that parallel compilation works properly, since storing HygieneData
in a thread local will cause problems if a Span
is used on multiple threads.
I'm not really sure how best to go about doing this. ExpnId
s are currently unique per-crate, since they are never serialized. We need some way of making ExpnIds
globally unique.
Activity
petrochenkov commentedon Jan 31, 2020
This is a big issue and one of the primary blockers for stabilizing
Span::def_site
(#54724, #54727) and declarative macros 2.0 (#39412).There's a bunch of FIXMEs in the compiler about this, and I'm kind of surprised that there was no existing GitHub issue.
#49300 (comment) suggests a cross-crate stable representation for
ExpnId
s based on def-paths, which can be used for serializing them into metadata.Aaron1011 commentedon Jan 31, 2020
Aaron1011 commentedon Feb 3, 2020
Here's an idea for how to serialize
SyntaxContext
itself:We could just serialize the entire
HygieneData
, and serializeSyntaxContext
as just the underlyingu32
. When we deserialize, we would lookup the correspondingSyntaxContextData
in the serializedHygieneData
. If we've already interned thatSyntaxContextData
, we would map the serializedSyntaxContext
id to the id in the thread-local interner. If we haven't yet interned theSyntaxContextData
, we would store it inHygieneData.syntax_context_data
, and return the new id.However, this approach interacts very badly with cross-crate serialization. Whenever we need to deserialize a
Span
, we will need to insert itsSyntaxContextData
into our local interner. This can happen a result of seemingly unrelated queries (e.gtcx.optimized_mir
) which require deserializingSpans
. This has the potential to use up a large number of non-internedctxt_or_zero
values inSpan
, causing us to intern spans that we could otherwise store 'inline'.The situation gets worse if we need to re-serialize cross-crate spans into our own metadata. This will propagate the extra
SyntaxContext
ids to all downstream crates, effectively requiring them to internSyntaxContextData
instances for transitive dependencies. If we turn on MIR inlining by default, this could become very common, sincelibcore
andlibstd
have many small#[inline]
functions.This is essentially the same problem I described in #68718 (comment), but for
SyntaxContext
instead ofExpnId
. However, I don't think there's a straightforward way to map aSyntaxContext
to aDefPath
, so we'll need a different approach.If we assume that 'crate-local' spans are accessed much more frequently than 'cross-crate' spans, then I think we can come up with a better solution. We can turn
SyntaxContextData
into a two-variant enum:When we create a fresh
SyntaxContext
(which is by definition crate-local), we will useSyntaxContextData::Local
with the same fields as before.When we deserialize
Span
from another crate's metadata, we will create a most oneSyntaxContextData::Remote
per crate. It will be of the formSyntaxContextData::Remote(cnum, remote_id)
, whereremote_id
stores theSyntaxContext
used to index into theHygieneData
of the remote crate.To retrieve information about a
SyntaxContextData:Remote
, we can either create ahygiene_data(cnum)
query that deserializes theHygieneData
from cratecnum
, or keep aHashMap<CrateNum, HygieneData>
in memory to speed up lookups.This will make access to
SyntaxContextData
for cross-crate spans slightly slower than accesses for crate-local spans. However, we will save almost all of theSpan.ctxt_or_zero
slots, allowing us to continue to storeSpan
s in the inline format.With this approach, we can also heuristically choose to deserialize some cross-crate
SyntaxContext
s asSyntaxContextData::Local
rather thanSyntaxContextData::Remote
(e.g. if we have reason to believe that they will be accessed frequently).Auto merge of #69432 - petrochenkov:alldeps, r=<try>
Auto merge of #69432 - petrochenkov:alldeps, r=eddyb
DefPathData
into crate metadata #7007747 remaining items