Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Option<T> from base64 serde helpers #2291

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

jhendrixMSFT
Copy link
Member

Empty Vec is enough to indicate absense of data.

Empty Vec<u8> is enough to indicate absense of data.
@jhendrixMSFT
Copy link
Member Author

Need to fix impacted SDKs.

@jhendrixMSFT jhendrixMSFT marked this pull request as draft March 6, 2025 17:25
@heaths heaths marked this pull request as ready for review March 6, 2025 18:50
Copy link
Member

@heaths heaths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LarryOsterman since I made significant changes to this, I don't feel right merging this with only my approval. Could you take a look, please?

@LarryOsterman
Copy link
Member

Are you sure that an empty vector is always illegal?

In general, these are not the same:

{
   "my_vec": [],
   "my_number": 3
}
{
  "my_number":3
}

The same discussion applies to strings. { "string":""} is not the same as {}.

deserialize_with = "base64::deserialize_url_safe",
serialize_with = "base64::serialize_url_safe",
deserialize_with = "base64::deserialize_url_safe_opt",
serialize_with = "base64::serialize_url_safe_opt",
skip_serializing_if = "Option::is_none"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, @jhendrixMSFT for non_opt variants, this will need to be Vec::is_empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, and the emitter handles this already. I was in the process of cleaning all of this up but wanted to wait until we're content with omitting Option<T> for base64 encoded bytes.

/// pub value: Option<Vec<u8>>,
/// }
/// ```
pub fn deserialize<'de, D>(deserializer: D) -> Result<Option<Vec<u8>>, D::Error>
pub fn deserialize_opt<'de, D>(deserializer: D) -> Result<Option<Vec<u8>>, D::Error>
Copy link
Member Author

@jhendrixMSFT jhendrixMSFT Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the date module the pattern used there is to place the Option<T> variants into a pub mod option like this. Do we want to do the same here? Or, if this is favored, then we should update date to follow this same pattern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. I forgot about that. Yeah, let's do it that way for consistency. It also "hides" them a little better since, generally, we probably don't want them...barring what @LarryOsterman brought up but I thought @JeffreyRichter already signed off on this change anyway. Seems - though we could be wrong, certainly - this is really only a problem for JSON merge+patch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved Option<T> variants to the option sub-module.

@LarryOsterman
Copy link
Member

FWIW, I've been doing a lot of thinking in this space in the context of string model fields. Eventually I landed on:

  • Types have two states: Default and Not-Default.
  • Model fields have two states: Present or Not-Present.

The core question here is "Can a client use "type is default" as a proxy for "field is not-present"".

A lot of that depends on the semantics of the service. And I don't know if we have enough information to authoritatively determine if the default value of a type is invalid or not (simple example: All EventHubs instances must have at least one partition. That means that the EventHubs service client doesn't need to use Optional to represent the number of partitions, because the default value can be safely used as a proxy for "no partitions" (and yeah, this isn't the best example but its' the first one I came up with).

@LarryOsterman
Copy link
Member

FWIW, I've been doing a lot of thinking in this space in the context of string model fields. Eventually I landed on:

  • Types have two states: Default and Not-Default.
  • Model fields have two states: Present or Not-Present.

The core question here is "Can a client use "type is default" as a proxy for "field is not-present"".

A lot of that depends on the semantics of the service. And I don't know if we have enough information to authoritatively determine if the default value of a type is invalid or not (simple example: All EventHubs instances must have at least one partition. That means that the EventHubs service client doesn't need to use Optional to represent the number of partitions, because the default value can be safely used as a proxy for "no partitions" (and yeah, this isn't the best example but its' the first one I came up with).

Btw, I spent a bunch of time thinking about this - this change is righteous because it only affects the deserialization of the Base64 encoded value, it does not affect the Optional generation for the field containing the base64 decoded string - in other words, this should be changing Option<Option<Vec<u8>> into Option<Vec<u8>> which should be just fine (because we don't care about the difference between { "base64": "" } and {"base64":null}).

@heaths
Copy link
Member

heaths commented Mar 7, 2025

this should be changing Option<Option<Vec> into Option<Vec>

Not sure what you mean. Fields would either be Option<Vec<u8>> or Vec<u8>. If we ever have code that produces an Option<Option<T>> that's a bug, most likely.

@heaths
Copy link
Member

heaths commented Mar 7, 2025

Are you sure that an empty vector is always illegal?

Huh?

In general, these are not the same:

{
   "my_vec": [],
   "my_number": 3
}
{
  "my_number":3
}

The same discussion applies to strings. { "string":""} is not the same as {}.

No, but in deserialization scenarios, when are they practically different?

@LarryOsterman
Copy link
Member

Are you sure that an empty vector is always illegal?

Huh?

The only time when it is ok to treat field.is_empty() as the same as field.is_none() is if you know that the service will never return an empty vector.

No, but in deserialization scenarios, when are they practically different?

When we are doing codegen, we don't know the answer to that question. That's the crux of the issue here. It's possible that it is important to distinguish between "the field value was an empty vector" and "the field value was not provided".

But I do not believe that we have sufficient information to know if it is necessary to distinguish between those two cases. That's why the type of all model fields needs to be Option<field type>.

However, more to the point of this PR, the question being asked is "Do we need to distinguish between a base64 encoded "" and a base64 encoded value whose value is null. I assert that we almost certainly don't need to distinguish between these two cases.

It's important to distinguish between the type of a deserialized Base64 encoded value and the type of all model types.

The type of a deserialized Base64 encoded value is Vec<u8>.
The type of a model field whose type is a base64 encoded string is Option<Vec<u8>>.

That's because the type of ALL model fields is Option<T> where T is the underlying type of the field.

@heaths
Copy link
Member

heaths commented Mar 7, 2025

That's because the type of ALL model fields is Option where T is the underlying type of the field.

And this is what I'm starting to question more and more and think we need to revisit with @JeffreyRichter. At least for Rust, we've already decided that a crate version will support the service api-version against which it was built, but there's no guarantee the same is true for other api-versions before or after the declared version. It might - probably will - but we can't guarantee that. No language can. I appreciate that making every field Option<T> or, in Go, *T mitigates some of that risk - probably the vast majority of breaking changes we see in the REST API board - but at the expensive of DX (developer experience) for every developer just because, occasionally, it might be a problem.

Having to do this all the time really sucks for devs:

buf := m.buf
if buf == nil {
  // what? panic?
  panic("spec says it's required, but here we are")
}
let Some(buf) = m.buf else {
  return Err(what::ErrorShouldIReturn::possibly_new_method("can I even construct an error from a string?"));
}

Sure, in Go they could ignore the nil check but go vet will rightly complain and they'll have to ignore it. In Rust they could just unwrap() it but clippy will likely complain and they'll have to ignore it. Why are we putting this pain on our devs, especially if, in rust, we've already stated a 1:1 relationship as stated above?

We're sacrificing the DX of all developers for the sake of an occasional mistake by service teams for which we can release a new crate version? Of course, that's where things get messy (I'm partly working through this real time)...

Most (any?) Azure SDK languages don't do semver correctly. I've had to make breaking changes in Key Vault, for example, a few times - as I know other libraries have - and have never changed the major version. For NuGet it doesn't really matter because it'll accept any version upgrade without regard for semver rules. I know that's not true of npm and cargo, though.

So if we do make a breaking change, we should change the major version because it does matter for cargo (et. al. e.g., npm). Still, I question whether sacrificing the DX with all Option<T> fields is worth it. Rust rightly forces you deal with all variants ot blinding unwrap() (or expect()) fields that should otherwise have a value. But that means a panic which will likely kill the process.

/cc @RickWinter @JeffreyRichter

@heaths heaths marked this pull request as draft March 7, 2025 19:24
@jhendrixMSFT
Copy link
Member Author

jhendrixMSFT commented Mar 7, 2025

I agree that the ergonomics here are pretty bad. When designing Go track 2, we explored various ways to improve this but concluded that the other options weren't any better.

Most of the problems arise from models being overly DRY as they're used in CRUD operations. A common pattern, at least in ARM, is to GET some resource, make changes to the resultant model, then PUT said model. In addition, these same models are also used in PATCH operation so you must be able to specify only fields you want to modify.

@heaths
Copy link
Member

heaths commented Mar 8, 2025

FWIW, looking at another cloud SDK, they have opted for Option<T> fields for all models it seems as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants