Skip to content

Span hygiene data should be serialized to crate metadata #68686

Closed
@Aaron1011

Description

@Aaron1011
Member

When we serialize a Span to crate metadata, we currently throw away the SyntaxContext:

impl rustc_serialize::UseSpecializedEncodable for Span {
fn default_encode<S: Encoder>(&self, s: &mut S) -> Result<(), S::Error> {
let span = self.data();
s.emit_struct("Span", 2, |s| {
s.emit_struct_field("lo", 0, |s| span.lo.encode(s))?;
s.emit_struct_field("hi", 1, |s| span.hi.encode(s))
})
}
}
impl rustc_serialize::UseSpecializedDecodable for Span {
fn default_decode<D: Decoder>(d: &mut D) -> Result<Span, D::Error> {
d.read_struct("Span", 2, |d| {
let lo = d.read_struct_field("lo", 0, Decodable::decode)?;
let hi = d.read_struct_field("hi", 1, Decodable::decode)?;
Ok(Span::with_root_ctxt(lo, hi))
})
}
}

This is because the backing HygieneData is stored in a thread-local in rustc_span, and not serialized into crate metadata.

The result is that spans deserialized from crate metadata may have less information available than spans from the current crate. If the MIR inlining pass decides to inline a function from another crate, we may end up with suboptimal messages when we invoke span.ctxt() (e.g. when emitting debuginfo, and when evaluating the caller_location intrinsic).

It would be useful if we were to serialize HygieneData into crate metadata, and deserialize spans with the proper SyntaxContext. This will also ensure that parallel compilation works properly, since storing HygieneData in a thread local will cause problems if a Span is used on multiple threads.

I'm not really sure how best to go about doing this. ExpnIds are currently unique per-crate, since they are never serialized. We need some way of making ExpnIds globally unique.

Activity

petrochenkov

petrochenkov commented on Jan 31, 2020

@petrochenkov
Contributor

This is a big issue and one of the primary blockers for stabilizing Span::def_site (#54724, #54727) and declarative macros 2.0 (#39412).
There's a bunch of FIXMEs in the compiler about this, and I'm kind of surprised that there was no existing GitHub issue.

#49300 (comment) suggests a cross-crate stable representation for ExpnIds based on def-paths, which can be used for serializing them into metadata.

Aaron1011

Aaron1011 commented on Jan 31, 2020

@Aaron1011
Author
added
A-macrosArea: All kinds of macros (custom derive, macro_rules!, proc macros, ..)
on Jan 31, 2020
added
A-metadataArea: Crate metadata
C-enhancementCategory: An issue proposing an enhancement or a PR with one.
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Jan 31, 2020
Aaron1011

Aaron1011 commented on Feb 3, 2020

@Aaron1011
MemberAuthor

Here's an idea for how to serialize SyntaxContext itself:

We could just serialize the entire HygieneData, and serialize SyntaxContext as just the underlying u32. When we deserialize, we would lookup the corresponding SyntaxContextData in the serialized HygieneData. If we've already interned that SyntaxContextData, we would map the serialized SyntaxContext id to the id in the thread-local interner. If we haven't yet interned the SyntaxContextData, we would store it in HygieneData.syntax_context_data, and return the new id.

However, this approach interacts very badly with cross-crate serialization. Whenever we need to deserialize a Span, we will need to insert its SyntaxContextData into our local interner. This can happen a result of seemingly unrelated queries (e.g tcx.optimized_mir) which require deserializing Spans. This has the potential to use up a large number of non-interned ctxt_or_zero values in Span, causing us to intern spans that we could otherwise store 'inline'.

The situation gets worse if we need to re-serialize cross-crate spans into our own metadata. This will propagate the extra SyntaxContext ids to all downstream crates, effectively requiring them to intern SyntaxContextData instances for transitive dependencies. If we turn on MIR inlining by default, this could become very common, since libcore and libstd have many small #[inline] functions.

This is essentially the same problem I described in #68718 (comment), but for SyntaxContext instead of ExpnId. However, I don't think there's a straightforward way to map a SyntaxContext to a DefPath, so we'll need a different approach.

If we assume that 'crate-local' spans are accessed much more frequently than 'cross-crate' spans, then I think we can come up with a better solution. We can turn SyntaxContextData into a two-variant enum:

enum SyntaxContextData {
    Local(/* the current SyntaxContextData fields */),
    Remote(CrateNum, SyntaxContext)
}

When we create a fresh SyntaxContext (which is by definition crate-local), we will use SyntaxContextData::Local with the same fields as before.

When we deserialize Span from another crate's metadata, we will create a most one SyntaxContextData::Remote per crate. It will be of the form SyntaxContextData::Remote(cnum, remote_id), where remote_id stores the SyntaxContext used to index into the HygieneData of the remote crate.

To retrieve information about a SyntaxContextData:Remote, we can either create a hygiene_data(cnum) query that deserializes the HygieneData from crate cnum, or keep a HashMap<CrateNum, HygieneData> in memory to speed up lookups.

This will make access to SyntaxContextData for cross-crate spans slightly slower than accesses for crate-local spans. However, we will save almost all of the Span.ctxt_or_zero slots, allowing us to continue to store Spans in the inline format.

With this approach, we can also heuristically choose to deserialize some cross-crate SyntaxContexts as SyntaxContextData::Local rather than SyntaxContextData::Remote (e.g. if we have reason to believe that they will be accessed frequently).

added 2 commits that reference this issue on Feb 24, 2020

Auto merge of #69432 - petrochenkov:alldeps, r=<try>

Auto merge of #69432 - petrochenkov:alldeps, r=eddyb

47 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-macrosArea: All kinds of macros (custom derive, macro_rules!, proc macros, ..)A-metadataArea: Crate metadataC-enhancementCategory: An issue proposing an enhancement or a PR with one.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @Aaron1011@jonas-schievink@petrochenkov@rustbot

      Issue actions

        Span hygiene data should be serialized to crate metadata · Issue #68686 · rust-lang/rust