Add optional unsafe operations #97

nockty · 2023-09-22T08:52:58Z

Context

For messages that contain many string fields (e.g. repeated string with many elements coming in), UnmarshalVT can spend a lot of CPU time in runtime.slicetobytestring. Indeed, when decoding the []byte data, it does a string(bytes) cast (e.g. m.Foo1 = string(dAtA[iNdEx:postIndex])). Since []byte is mutable and string is not, this cast requires an allocation for safety. This allocation, repeated many times, sometimes turns out to be expensive.

Feature request

We could avoid this by using the unsafe package that allows us to perform this cast without an allocation:

func unsafeBytesToString(b []byte) string {
	return *(*string)(unsafe.Pointer(&b))
}

Of course, the user has to be careful because if they overwrite the []byte data they received from the wire, then the string is mutated. So, this feature should be opt-in in my opinion.

This feature is mentioned in another issue: Per Msg/Field Features. I think having per-message / per-field features is not mandatory to implement unsafe casting, though.

Note that this feature applies to bytes fields as well where we could reference data instead of copying it.

Proposal

I think this feature is worth implementing and I'm open to ideas. In my opinion, a simple and pragmatic approach to implement it would be to add unsafe functions, such as UnmarshalVTUnsafe, which perform such operations for all applicable fields. I've actually started such an implementation on my personal fork and it seems to work well (diff for anyone curious, although it doesn't work yet for bytes fields). Now, before spending more time on it, I would love to gauge interest and hear considerations from others!

A few considerations I had on my side:

I think having a different function from UnmarshalVT is mandatory because several applications can use the same generated code for a message and we cannot ask them to all be careful not overwriting data received from the wire.
I considered adding a feature but it sounds weird because it would be transversal to other features.
- Either we could just always generate both the safe and unsafe versions of the function;
- or I think we could add an --unsafe flag (different from features) to trigger the generation of unsafe functions.

Let me know what you folks think about all this!

The text was updated successfully, but these errors were encountered:

vmg · 2023-09-22T14:19:06Z

This is a perfectly sensible feature, and in fact we already have an open PR for it: #79

It's just missing some cleanup and some better naming. I'm definitely in support of it, by providing a separate API (UnmarshalUnsafe) instead of an --unsafe flag (which would be dangerous IMO).

nockty · 2023-09-22T15:02:15Z

Ah, I didn't realized there was already a PR. 🤦‍♂️ Thanks for pointing it out, I'll take a look into it!

Regarding the --unsafe flag, I was actually thinking about it this way:

without --unsafe (marshal+unmarshal features): generate UnmarshalVT and MarshalVT only
with --unsafe (marshal+unmarshal features): generate UnmarshalVT, UnmarshalUnsafeVT, MarshalVT, and MarshalUnsafeVT

I actually like the PR's approach more. It would consist in adding new features like unmarshal_unsafe and marshal_unsafe. I'm going to iterate on that. 👍

nockty added the enhancement New feature or request label Sep 22, 2023

nockty mentioned this issue Oct 4, 2023

Add unmarshal_unsafe feature #103

Merged

vmg closed this as completed Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional unsafe operations #97

Add optional unsafe operations #97

nockty commented Sep 22, 2023

vmg commented Sep 22, 2023

nockty commented Sep 22, 2023

Add optional unsafe operations #97

Add optional unsafe operations #97

Comments

nockty commented Sep 22, 2023

Context

Feature request

Proposal

vmg commented Sep 22, 2023

nockty commented Sep 22, 2023