Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null values in fields of type JSONB (Postgresql) #40

Open
fdpic opened this issue Feb 6, 2025 · 7 comments
Open

Null values in fields of type JSONB (Postgresql) #40

fdpic opened this issue Feb 6, 2025 · 7 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@fdpic
Copy link

fdpic commented Feb 6, 2025

Hello,

Given this Postgresql table:

create table my_table
(
    id   bigserial primary key,
    data_value_json       jsonb
)

and the following datamimic descriptor:

<setup>

    <database id="source_db" dbms="postgresql"/>
    <database id="target_db" dbms="postgresql"/>

    <generate source="source_db" target="target_db" name="my_table"/>
</setup>

When data_value_json is not None, the value is correctly inserted.
However, it seems that when data_value_json is None in source_db, then the string 'null' (and not SQL NULL) is inserted in target_db.
Kind of similar to psycopg/psycopg2#1433 (although I don't know if Datamimic/sqlalchemy uses a json wrapper)

Not sure if its a bug. As pointed out in that discussion, there might be cases where 'null' is the expected behavior. But then is there any way to (conditionally) flag a key as "ignored"?

Version: last week's development branch.

@fdpic fdpic changed the title Null Postgresql fields of type JSONB Null fields of type JSONB (Postgresql) Feb 6, 2025
@fdpic fdpic changed the title Null fields of type JSONB (Postgresql) Null values in fields of type JSONB (Postgresql) Feb 6, 2025
@ake2l ake2l added enhancement New feature or request help wanted Extra attention is needed and removed enhancement New feature or request labels Feb 7, 2025
@ake2l ake2l assigned ake2l and unassigned tunglxfast Feb 7, 2025
@ake2l
Copy link
Member

ake2l commented Feb 7, 2025

Hi fdpic,

Thanks for raising this issue. We reviewed your case and attempted to reproduce the behavior. Based on our findings, DATAMIMIC is working correctly, and the observed behavior aligns with the expected handling of NULL and "null" values.

Image
Image
Image

Key Observations

  • Row 1: A valid JSON object is stored as expected.
  • Row 2: The string "null" is explicitly stored because it was defined as a string constant in the setup.
  • Rows 3-10: The None script correctly results in SQL NULL values in the target table.

Recommendations

If your goal is to ensure that NULL values in the source remain NULL in the target:

  • Verify your model definition to ensure downstream processes handle NULL correctly.
  • Use default values if necessary, e.g., defaultValue="{}" to store empty JSON objects instead of NULL.

Let us know if you need further clarification or adjustments!
In case these information help you, please support us and leave us a star.

Thank you and Best,
DATAMIMIC Team

@fdpic
Copy link
Author

fdpic commented Feb 7, 2025

Many thanks for the quick answer.

Using an example which as far as I can tell is similar to what you did, I am still getting 'null's (if the column type is changed to Text from jsonb, NULLs are stored as expected).

Maybe I missed something, will take a closer look next week.

@ake2l
Copy link
Member

ake2l commented Feb 8, 2025

We are happy to help. In case the issues persist, please provide us more information and we will try to dive deeper.

Best
DATAMIMIC team

@fdpic
Copy link
Author

fdpic commented Feb 10, 2025

OK, so my mistake, what gets inserted is not the 'null' string, it is actually json null.

For example, given

    <generate source="source_db" target="db" selector="select data_value_json from my_table" count="5">
        <key name="data_value_json"  script="data_value_json"/>
        <key name="data_value_text"  script="data_value_json"/>
    </generate>

and

create table my_table
(
    id  bigserial primary key,
    data_value_json jsonb,
    data_value_text text);

, the table looks like:
Image

select * from my_table where data_value_json::jsonb = 'null' -- returns all rows
select * from my_table where data_value_json is null -- returns no rows

Looking at the sqlalchemy doc, there is a none_as_null flag which controls how None is handled.
And looking at the code, my understanding is that the flag defaults to False which causes None to be inserted as json null.

Maybe I have misunderstood something. Or maybe you guys have a different setting somewhere that causes nulls to be inserted.
Ideally, none_as_null = True would be passed to DataMimic and then DataMimic would pass it on to sqlalchemy, but I'm not even sure that is possible.
Any (other) ideas?

Many thanks in advance.

@ake2l
Copy link
Member

ake2l commented Feb 13, 2025

Thanks for the clarification about the JSON null vs SQL NULL behavior. I think we can extend the existing configuration system to handle SQLAlchemy settings elegantly.

Since you're already using a properties file structure for database configuration, we could extend this to include SQLAlchemy configurations in a similar pattern. Here are a few possible approaches:

  1. Direct SQLAlchemy properties:
postgresql.db.sqlalchemy.none_as_null=true
postgresql.db.sqlalchemy.pool_size=5
postgresql.db.sqlalchemy.pool_timeout=30

This would allow users to configure SQLAlchemy behavior per database system while maintaining the current configuration pattern.

Would any of these approaches work for your use case? We can discuss the implementation details once we agree on the configuration structure.

@ake2l ake2l assigned dangsg and unassigned ake2l Feb 13, 2025
@ake2l ake2l added documentation Improvements or additions to documentation enhancement New feature or request and removed help wanted Extra attention is needed labels Feb 13, 2025
@fdpic
Copy link
Author

fdpic commented Feb 14, 2025

Yes, something like that would be fantastic!

@ake2l
Copy link
Member

ake2l commented Feb 19, 2025

Hi @fdpic ,
We just implemented as solution for this topic ... you can check here #52
Please check if it is working for you and close the issue , once you feel okay with.

Best
DATAMIMIC team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants