- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.6k
Add vendor-specific suffixes to v0 mangling RFC 2603 #3224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vendor-specific suffixes to v0 mangling RFC 2603 #3224
Conversation
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
| Fixed | 
| So I was talking to @lqd about my confusion around the Itanium ABI mangling not having such a general mechanism, and @lqd found that it was added in itanium-cxx-abi/cxx-abi#36, relatively recently! Especially if someone doesn't find that repo, they're likely reading outdated versions without vendor suffixes. 
 Oh, wow, I was wrong,  $ c++filt _ZN3foo3barE _ZN3foo3barEv _ZN3foo3barE.llvm.123 _ZN3foo3barEv.llvm.123
foo::bar
foo::bar()
_ZN3foo3barE.llvm.123
foo::bar() [clone .llvm.123]I think it was a bad idea for this to be "nested" in a C++ symbol, because you then get weird stuff like that, and ideally it could be handled uniformly for every symbol name (after all, LLVM adds its suffixes without knowing about the mangling of the original symbol name, be it C++, Rust, unmangled, some other thing, etc.). But specifying it in the RFC won't hurt, I suppose. | 
| Alright, let's try to move this forward: @rfcbot fcp merge | 
| @rfcbot merge | 
| Team member @michaelwoerister has proposed to merge this. The next step is review by the rest of the tagged team members: 
 No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. | 
| // There are no restrictions on the characters that may be used | ||
| // in the suffix following the `.`. | ||
| <vendor-specific-suffix> = "." <suffix> | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the suffix contain any characters at all? Or do these characters need to be in A-Z, a-z, 0-9, and _ as in RFC 2603?
As written this sounds like, for instance, arbitrary Unicode would be allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // There are no restrictions on the characters that may be used | |
| // in the suffix following the `.`. | |
| <vendor-specific-suffix> = "." <suffix> | |
| // The suffix following `.` can contain any character in the allowed alphanumeric set. | |
| <vendor-specific-suffix> = "." <suffix> | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have no control over what external parties like LLVM stick into that suffix, we should indeed strive for allowing anything at all. LLVM for example will add suffixes that contain more . characters, like .llvm.3285396211802591752 or .cold.1.
| <instantiating-crate> = <path> | ||
| // There are no restrictions on the characters that may be used | ||
| // in the suffix following the `.`. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this also allow $ in addition to .? I think tools out there are supposed to expect this as well, because of Itanium:
Mangled names containing $ or . are reserved for private implementation use. Names produced using such extensions are inherently non-portable and should be given internal linkage where possible.
Here's an example of someone finding a Rust symbol with a $ suffix rather than a . suffix: https://www.reddit.com/r/rust/comments/t200fl/where_does_in_mangled_name_came_from/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, yes. Seems likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that this should be added.
| Is the v0 mangling scheme documented somewhere other than this RFC? Or, why is this an amendment of an already accepted RFC instead of a brand new one? | 
| @Havvy The original v0 mangling RFC was this one: #2603 . I assume this is an amendment in order to reduce confusion for anyone looking for the canonical grammar who will most likely consult the page at https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html . | 
| @bstrie Yes, and this goes against historical precedent of generally not updating RFCs with substantial content after they are approved. The mangling scheme, as it is is implemented, should be documented somewhere not in the RFCs repo and we could probably update the RFC to link to that. But I don't know where that is, if anywhere. And if it's not anywhere, then we should figure out where it should go and put it there. | 
| @rfcbot reviewed | 
| 
 In that case, I'm not sure we've been consistent across all RFCs. When I asked about this, I was directed to make amendments (there's been a few: https://github.com/rust-lang/rfcs/pulls?q=2603+OR+RFC2603 - though I regret not trying to make them more consistent name-wise). | 
| 🔔 This is now entering its final comment period, as per the review above. 🔔 | 
| @Havvy Oh, another thing to keep in mind is, #2705, #2879 and this PR all fix the wording to "reflect reality" when this mangling scheme was first implemented. In the case of this PR, "vendor-specific suffixes" were being added by LLVM and stripped by  The only amendment that wouldn't fit that description would be #3161 - about half of it could be applied to the RFC itself in order to clarify the original grammar in more detail (we were too handwavey initially), with the rest being a proper extension. That's where I'm not sure what to do about. I'll leave a comment on #3161. | 
e9ac2c0    to
    9b2672d      
    Compare
  
    | Thanks to @bstrie for bringing up https://www.reddit.com/r/rust/comments/t200fl/where_does_in_mangled_name_came_from/ which contains another vendor-specific mangling use case: thread local data on mach-o uses  @michaelwoerister and I feel it's the same de facto situation as the period separator happening in the wild, and that we should add both vendor-specific separators. I've therefore updated the PR to mention this, as a separate commit for easier review and possible comments. (I will squash everything before the PR gets merged, as there's also another cosmetic commit which correctly wraps at 80 chars like the rest of the text) | 
| 
 The mangling scheme is not documented anywhere else, as far as I know. As @bstrie says, I think it's good to have a single source of truth for the definition that does not require people to chase down all the updates that might be found somewhere else. If that source of truth would move to some other place (like e.g. the Rust reference), then making additional RFCs instead of amending the original one would be fine too, in my opinion, but as long as we don't have that, I personally prefer to update the RFC in-place for minor things like this. My intuition would be different for larger, breaking changes. | 
| Regarding "where is the definitive reference for this", it does seem like this is begging for a compiler-level reference to complement the existing language-level reference. I guess the rustc book ( https://doc.rust-lang.org/rustc/ ) might fill this role? | 
| The rustc book still seems to be aimed at end users. I'm not sure if that would be a good fit. But I don't have a strong opinion one way or other. It would be good to have some canonical place for having information like this (for example, documentation about the shape of our debuginfo could go there too). | 
| The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. | 
9b2672d    to
    fef292b      
    Compare
  
    | Note: we didn't squash and merge as soon as the FCP finished just to allow a bit more time for comments, especially about the similar case of  Now that this additional time has passed, I've squashed everything, and this should be ready to go. | 
This PR updates v0 mangling to add an Itanium ABI-like vendor-specific suffix to symbols.
This already happens in practice (e.g. with LLVM LTO to ensure a name is globally unique) and the demanglers following the spec strictly can't handle these symbols. This PR makes them officially part of the spec.
The vendor-specific suffix is optional and doesn't change semantics: it is IMO safe to ignore in the vast majority of cases.
Improvements to the wording and description would be welcome. Opening the PR for possible discussion at today's t-compiler meeting, and possible FCP later.
cc @michaelwoerister