You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For messages that contain many string fields (e.g.repeated string with many elements coming in), UnmarshalVT can spend a lot of CPU time in runtime.slicetobytestring. Indeed, when decoding the []byte data, it does a string(bytes) cast (e.g.m.Foo1 = string(dAtA[iNdEx:postIndex])). Since []byte is mutable and string is not, this cast requires an allocation for safety. This allocation, repeated many times, sometimes turns out to be expensive.
Feature request
We could avoid this by using the unsafe package that allows us to perform this cast without an allocation:
Of course, the user has to be careful because if they overwrite the []byte data they received from the wire, then the string is mutated. So, this feature should be opt-in in my opinion.
This feature is mentioned in another issue: Per Msg/Field Features. I think having per-message / per-field features is not mandatory to implement unsafe casting, though.
Note that this feature applies to bytes fields as well where we could reference data instead of copying it.
Proposal
I think this feature is worth implementing and I'm open to ideas. In my opinion, a simple and pragmatic approach to implement it would be to add unsafe functions, such as UnmarshalVTUnsafe, which perform such operations for all applicable fields. I've actually started such an implementation on my personal fork and it seems to work well (diff for anyone curious, although it doesn't work yet for bytes fields). Now, before spending more time on it, I would love to gauge interest and hear considerations from others!
A few considerations I had on my side:
I think having a different function from UnmarshalVT is mandatory because several applications can use the same generated code for a message and we cannot ask them to all be careful not overwriting data received from the wire.
I considered adding a feature but it sounds weird because it would be transversal to other features.
Either we could just always generate both the safe and unsafe versions of the function;
or I think we could add an --unsafe flag (different from features) to trigger the generation of unsafe functions.
Let me know what you folks think about all this!
The text was updated successfully, but these errors were encountered:
This is a perfectly sensible feature, and in fact we already have an open PR for it: #79
It's just missing some cleanup and some better naming. I'm definitely in support of it, by providing a separate API (UnmarshalUnsafe) instead of an --unsafe flag (which would be dangerous IMO).
Ah, I didn't realized there was already a PR. 🤦♂️ Thanks for pointing it out, I'll take a look into it!
Regarding the --unsafe flag, I was actually thinking about it this way:
without --unsafe (marshal+unmarshal features): generate UnmarshalVT and MarshalVT only
with --unsafe (marshal+unmarshal features): generate UnmarshalVT, UnmarshalUnsafeVT, MarshalVT, and MarshalUnsafeVT
I actually like the PR's approach more. It would consist in adding new features like unmarshal_unsafe and marshal_unsafe. I'm going to iterate on that. 👍
Context
For messages that contain many
string
fields (e.g.repeated string
with many elements coming in),UnmarshalVT
can spend a lot of CPU time inruntime.slicetobytestring
. Indeed, when decoding the[]byte
data, it does astring(bytes)
cast (e.g.m.Foo1 = string(dAtA[iNdEx:postIndex])
). Since[]byte
is mutable andstring
is not, this cast requires an allocation for safety. This allocation, repeated many times, sometimes turns out to be expensive.Feature request
We could avoid this by using the
unsafe
package that allows us to perform this cast without an allocation:Of course, the user has to be careful because if they overwrite the
[]byte
data they received from the wire, then thestring
is mutated. So, this feature should be opt-in in my opinion.This feature is mentioned in another issue: Per Msg/Field Features. I think having per-message / per-field features is not mandatory to implement unsafe casting, though.
Note that this feature applies to
bytes
fields as well where we could reference data instead of copying it.Proposal
I think this feature is worth implementing and I'm open to ideas. In my opinion, a simple and pragmatic approach to implement it would be to add unsafe functions, such as
UnmarshalVTUnsafe
, which perform such operations for all applicable fields. I've actually started such an implementation on my personal fork and it seems to work well (diff for anyone curious, although it doesn't work yet forbytes
fields). Now, before spending more time on it, I would love to gauge interest and hear considerations from others!A few considerations I had on my side:
UnmarshalVT
is mandatory because several applications can use the same generated code for a message and we cannot ask them to all be careful not overwriting data received from the wire.--unsafe
flag (different from features) to trigger the generation of unsafe functions.Let me know what you folks think about all this!
The text was updated successfully, but these errors were encountered: