Skip to content

Bug: Pydantic objects always serialized using alias #6728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dehanjl opened this issue May 27, 2025 · 4 comments
Open

Bug: Pydantic objects always serialized using alias #6728

dehanjl opened this issue May 27, 2025 · 4 comments
Assignees
Labels
bug Something isn't working need-customer-feedback Requires more customers feedback before making or revisiting a decision

Comments

@dehanjl
Copy link

dehanjl commented May 27, 2025

Expected Behaviour

The default behavior of Pydantic is to model dump using by_alias=False. Lambda powertools seems to override this somehow.

Expected return from code snippet below:

{
  "fieldOne": "value1",
  "field_two": "value2"
}

Note: When I remove the alias generator, and set the fields manually, the issue does not occur; hence why I suspect this has something to do with aliases.

Current Behaviour

Currently, when creating a Pydantic object that has an alias generator, and model dumping it explicitly using by_alias=False before returning it from an API Gateway Rest Resolver endpoint, it is always return by the alias, even when it should not be.

Actually returned from code snippet below:

{
  "field_one": "value1",
  "field_two": "value2"
}

Code snippet

### handler.py
import json

from aws_lambda_powertools.event_handler import APIGatewayRestResolver
from aws_lambda_powertools.utilities.typing import LambdaContext

from models import LegacyInconsistentModel, ShinyNewConsistentModel

app = APIGatewayRestResolver(enable_validation=True)


@app.get("/hello")
def hello():
    return {"message": "Hello, World!"}


@app.get("/info")
def info() -> LegacyInconsistentModel:
    obj = ShinyNewConsistentModel(field_one="value1", field_two="value2")
    ret_obj = LegacyInconsistentModel(**obj.model_dump())
    ret_dict = ret_obj.model_dump(by_alias=False)
    return ret_dict


def lambda_handler(event: dict, context: LambdaContext):
    return app.resolve(event, context)

### models.py
from pydantic import BaseModel, ConfigDict
from pydantic.alias_generators import to_snake


class LegacyInconsistentModel(BaseModel):
    model_config = ConfigDict(alias_generator=to_snake, populate_by_name=True)

    fieldOne: str
    field_two: str


class ShinyNewConsistentModel(BaseModel):
    field_one: str
    field_two: str

Possible Solution

No response

Steps to Reproduce

I have created a sample repository where I isolated the issue: https://github.com/dehanjl/lambda-powertools-serialization-test.

I've investigated the serializer and the layers it goes into; but nowhere that I can see is by_alias=True passed.

self._serializer = serializer or partial(json.dumps, separators=(",", ":"), cls=Encoder)

https://github.com/aws-powertools/powertools-lambda-python/blob/f106e368cf760b3585e36ebf7dc0f65a62c632b8/aws_lambda_powertools/shared/json_encoder.py
from aws_lambda_powertools.event_handler.openapi.compat import _model_dump

def _model_dump(model: BaseModel, mode: Literal["json", "python"] = "json", **kwargs: Any) -> Any:

Powertools for AWS Lambda (Python) version

latest

AWS Lambda function runtime

3.12

Packaging format used

Lambda Layers

Debugging logs

@dehanjl dehanjl added bug Something isn't working triage Pending triage from maintainers labels May 27, 2025
Copy link

boring-cyborg bot commented May 27, 2025

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

@dehanjl
Copy link
Author

dehanjl commented May 28, 2025

I think I may have found the root cause:

When enable_validation=True; this OpenAPI Validation Middleware is added, with by_alias=True. With no way to customize it's behavior as far as I can see.

@leandrodamascena
Copy link
Contributor

Hi @dehanjl, thanks for opening this issue. Yes, the default (and unchangeable so far) behavior is to serialize by alias, because when working with HTTP requests, fields can contain potentially characters like -, which can break the whole mechanism. But I'm curious about your use case. Can you explain me a bit more about your use case of serializing by alias that can break your system?

While I don't know the whole impact of allowing the customer to serialize by field instead of alias, I would consider thinking about a way for the customer to change this behavior, if it makes sense.

Thanka

@leandrodamascena leandrodamascena added need-customer-feedback Requires more customers feedback before making or revisiting a decision and removed triage Pending triage from maintainers labels May 28, 2025
@leandrodamascena leandrodamascena self-assigned this May 28, 2025
@dehanjl
Copy link
Author

dehanjl commented May 29, 2025

Hi @leandrodamascena, and thanks for getting back to me. That perspective makes sense.

So my exact use-case is that we have some Pydantic models that define our existing API contract (LegacyInconsistentModel) which contains a mix of camelCase and snake_case fields. We're in the process of migrating/upgrading our datastore, where we have created some new models (ShinyNewConsistentModel) where we have it consistent snake_case.

If I can add model_config = ConfigDict(alias_generator=to_snake, populate_by_name=True) to the legacy model, then I can just do return LegacyInconsistentModel(**shiny_new_consistent_model_instant.model_dump()); assuming they have the same fields and it's just a matter of case conversion.

Unfortunately, I can't do this because now the legacy model would be serialized by the snake case aliases, breaking the API contract.

So I guess my main concerns are as followings:

  1. This default serialization by alias does not seem to be documented, and we had to dig quite deep into the source code to find it.
  2. This is different from the Pydantic default (at least for V2); which is to serialize/model dump using by_alias=False. And this difference isn't clear, so its just surprising.
  3. Even if you manually and explicitly serialize to a dictionary with by_alias=False, this dictionary is intercepted by the validator which re-serializes the dictionary using the aliases.
  4. As far as I can tell, there is no (currently) way to customize this behavior, aside from disabling the validation and then creating my own middleware that does the validation with some custom config.

However, points 1 and 4 may just be skill issues and that I missed something.

So my guesses (options) for a solution/way forwarded would be:

  • Add a flag to control how the validation does the final serialization.
  • When instantiating the APIGatewayRestResolver instance, perhaps allow the user to pass a custom validator class explicitly. This way I could subclass/extend the existing validator provided by the library and add the custom logic I need. I would honestly prefer this approach as it makes it more obvious that I am doing something a bit weird and loading a potential footgun, and that way it's my responsibility and not the library's.

I would also be happy with a documentation update that makes it explicit that when using a validator, serialization is done using an alias.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working need-customer-feedback Requires more customers feedback before making or revisiting a decision
Projects
Status: Backlog
Development

No branches or pull requests

2 participants