Skip to content

Fix: expand dict transformation#9561

Open
Shamik-07 wants to merge 36 commits into
marimo-team:mainfrom
Shamik-07:fix/expand_dict_transformation
Open

Fix: expand dict transformation#9561
Shamik-07 wants to merge 36 commits into
marimo-team:mainfrom
Shamik-07:fix/expand_dict_transformation

Conversation

@Shamik-07

@Shamik-07 Shamik-07 commented May 15, 2026

Copy link
Copy Markdown
Contributor

📝 Summary

Using narwahls to convert all backend to polars and then using the unnest function of polars for expanding the dict and then convert it back to the original backend.
Closes #4583

Screen.Recording.2026-05-15.at.18.07.461.mov

📋 Pre-Review Checklist

  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • Video or media evidence is provided for any visual changes (optional).

✅ Merge Checklist

  • I have read the contributor guidelines.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Tests have been added for the changes made.

@vercel

vercel Bot commented May 15, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Jul 3, 2026 3:32pm

Request Review

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Architecture diagram
sequenceDiagram
    participant User as User (Marimo UI)
    participant DFPlugin as Dataframe Plugin
    participant Handler as ExpandDict Handler (handlers.py)
    participant Narwhals as Narwhals Layer
    participant Polars as Polars Engine
    participant Backend as Original Backend (pandas/Ibis/other)

    Note over User,Backend: Expand Dict Transformation Flow

    User->>DFPlugin: Trigger expand dict on column
    DFPlugin->>Handler: handle_expand_dict(df, transform)

    Handler->>Narwhals: collect_and_preserve_type(df)
    Narwhals->>Backend: Collect actual data from original backend
    Backend-->>Narwhals: Data as native type
    Narwhals-->>Handler: Collected DataFrame + undo function

    Handler->>Polars: collected_df.to_polars()
    Note over Handler,Polars: Convert to Polars for unnest support

    Polars->>Polars: polars_df.unnest(column_id)
    Note over Polars: Handles null dict values correctly

    Polars-->>Handler: Unnested Polars DataFrame

    Handler->>Narwhals: nw.from_native(unnested)
    Narwhals->>Handler: Narwhals wrapper

    Handler->>Handler: undo(narwhals_df)
    Note over Handler: Convert back to original backend type

    Handler-->>DFPlugin: Transformed DataFrame
    DFPlugin-->>User: Updated table with expanded columns
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Re-trigger cubic

Comment thread marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py Outdated
Comment thread tests/_plugins/ui/_impl/dataframes/test_handlers.py

@mscolnick mscolnick left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to take an optional dep on polars

Shamik-07 added 3 commits May 19, 2026 17:45
… rows with the create test dataframes instead.
refactor: adding None == NaN in assert frame equal with nans method to use it in expand_dict test.
@Shamik-07

Copy link
Copy Markdown
Contributor Author

need to take an optional dep on polars

Done.
Added allow_none_equals_nan for assert_frame_equal_with_nans as None!=NaN, which was causing the use of assert_frame_equal_with_nans to fail for test_expand_dict.

@Shamik-07 Shamik-07 requested a review from mscolnick May 19, 2026 22:30
@Shamik-07

Copy link
Copy Markdown
Contributor Author

There are some pandas CI errors that i am looking into.

@Shamik-07

Copy link
Copy Markdown
Contributor Author

There are some pandas CI errors that i am looking into.

This is happening because of data conversion mismatch between pandas and narwahls with mixed data types

"Could not convert '3' with type str: tried to convert to double"

for

test_print_code_result_matches_actual_transform_pandas(
    transform=ExpandDictTransform(
        type=TransformType.EXPAND_DICT,
        column_id='strings',
    ),
)

So my only option is to fallback to pandas backend processing separately for the unnest.
This should fix it.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py
@codecov

codecov Bot commented May 20, 2026

Copy link
Copy Markdown

Bundle Report

Bundle size has no change ✅

Affected Assets, Files, and Routes:

view changes for bundle: marimo-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/dist-*.js -107 bytes 169 bytes -38.77%
assets/dist-*.js 72 bytes 176 bytes 69.23% ⚠️
assets/dist-*.js -76 bytes 183 bytes -29.34%
assets/dist-*.js -52 bytes 335 bytes -13.44%
assets/dist-*.js -152 bytes 104 bytes -59.38%
assets/dist-*.js 35 bytes 137 bytes 34.31% ⚠️
assets/dist-*.js -60 bytes 104 bytes -36.59%
assets/dist-*.js 107 bytes 276 bytes 63.31% ⚠️
assets/dist-*.js 60 bytes 164 bytes 57.69% ⚠️
assets/dist-*.js -144 bytes 259 bytes -35.73%
assets/dist-*.js 87 bytes 256 bytes 51.48% ⚠️
assets/dist-*.js 227 bytes 403 bytes 128.98% ⚠️
assets/dist-*.js 23 bytes 160 bytes 16.79% ⚠️
assets/dist-*.js 40 bytes 177 bytes 29.2% ⚠️
assets/dist-*.js -233 bytes 102 bytes -69.55%
assets/dist-*.js 210 bytes 387 bytes 118.64% ⚠️
assets/dist-*.js -14 bytes 169 bytes -7.65%
assets/dist-*.js -23 bytes 137 bytes -14.37%

@cubic-dev-ai

cubic-dev-ai Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor

@cubic-dev-ai

@kirangadhave I have started the AI code review. It will take a few minutes to complete.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the dataframe Expand Dict transform robust to nulls by routing expansion through backend-native implementations (Polars unnest and Pandas json_normalize) and adds/updates tests to validate the behavior, including nested dict values.

Changes:

  • Update runtime transform handling to expand dict/struct columns using Pandas-native logic for pandas inputs and Polars unnest otherwise.
  • Update generated “print code” for Expand Dict in pandas and polars to match the new implementations.
  • Expand test coverage for Expand Dict with nulls and nested dicts; adjust equality helper to optionally treat None and NaN as equivalent.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py Implements Expand Dict via pandas json_normalize or Polars unnest after collection.
marimo/_plugins/ui/_impl/dataframes/transforms/print_code.py Updates printed code generation for Expand Dict for pandas and polars.
tests/_plugins/ui/_impl/dataframes/test_handlers.py Unskips/extends Expand Dict tests (nulls + nested dicts) and tweaks dataframe comparison helper.
tests/_plugins/ui/_impl/dataframes/test_print_code.py Adds print-code parity tests for Expand Dict with nested dicts for pandas and polars.

Comment thread marimo/_plugins/ui/_impl/dataframes/transforms/handlers.py Outdated
Comment on lines 2407 to 2411
result = apply(df, in_transform)
assert_frame_equal_with_nans(result, expected)

@staticmethod
@pytest.mark.parametrize(
("df", "expected"),
list(
zip(
create_test_dataframes(
{"nulls": [1, 2, 3, None, "hello"]}, include=["pandas"]
),
create_test_dataframes({"nulls": [None]}, include=["pandas"]),
strict=False,
)
),
)
def test_filter_rows_null_pandas_object(
df: DataFrameType, expected: DataFrameType
) -> None:
in_transform = FilterRowsTransform(
type=TransformType.FILTER_ROWS,
operation="keep_rows",
where=FilterGroup(
type="group",
operator="and",
children=[
FilterCondition(
type="condition",
column_id="nulls",
operator="in",
value=[None],
)
],
),
)
result = apply(df, in_transform)
assert_frame_equal_with_nans(result, expected)

@staticmethod
@pytest.mark.parametrize(

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Architecture diagram
sequenceDiagram
    participant UI as DataFrame UI
    participant Handler as NarwhalsTransformHandler
    participant Narwhals as Narwhals Layer
    participant Backend as DataFrame Backend
    participant PrintCode as Print Code Generator

    Note over UI,PrintCode: Expand Dict Transform Flow - Current State

    UI->>Handler: handle_expand_dict(DataFrame, ExpandDictTransform)
    Handler->>Narwhals: collect_and_preserve_type(df)
    Narwhals-->>Handler: (collected_df, undo)
    Handler->>Narwhals: collected_df.to_native()
    Narwhals-->>Handler: native_df

    alt Pandas Backend
        Handler->>Handler: Check if pandas dataframe
        Handler->>Backend: result_df = native_df.copy()
        Handler->>Backend: expanded = pd.json_normalize(result_df.pop(column_id).map(...), max_level=0)
        Backend-->>Handler: expanded DataFrame
        Handler->>Backend: expanded.index = result_df.index
        Handler->>Backend: result_df.join(expanded)
        Backend-->>Handler: joined DataFrame
        Handler->>Narwhals: undo(nw.from_native(joined))
        Narwhals-->>Handler: original backend type
    else Polars Backend
        Handler->>Narwhals: collected_df.to_polars()
        Narwhals-->>Handler: polars_df
        Handler->>Backend: polars_df.unnest(column_id)
        Backend-->>Handler: unnested DataFrame
        Handler->>Narwhals: undo(nw.from_native(unnested))
        Narwhals-->>Handler: original backend type
    end
    Handler-->>UI: Transformed DataFrame

    Note over UI,PrintCode: Print Code Generation

    UI->>PrintCode: Generate Python code for transform
    PrintCode->>PrintCode: Check backend type

    alt Pandas Backend
        PrintCode->>PrintCode: Generate: df.join(pd.json_normalize(df.pop(col).map(...), max_level=0).set_axis(...))
    else Polars Backend
        PrintCode->>PrintCode: Generate: df.unnest(column_id)
    end
    PrintCode-->>UI: Generated code string
Loading

Re-trigger cubic

@kirangadhave

Copy link
Copy Markdown
Member

@Shamik-07 can you please update the video in the PR to a higher res version? I'm having difficulty reading text in it

@Light2Dark Light2Dark requested a review from kirangadhave June 1, 2026 08:03
@Shamik-07

Copy link
Copy Markdown
Contributor Author

@Shamik-07 can you please update the video in the PR to a higher res version? I'm having difficulty reading text in it

Unfortunately, due to GH upload limitations, i can't upload a high res video.
PFA the notebook instead.
issue_4583_expand_dict_null_values.py

@kirangadhave kirangadhave left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address copilot review. We can do a manual round of reviews after that.

@Shamik-07

Copy link
Copy Markdown
Contributor Author

please address copilot review. We can do a manual round of reviews after that.

Thanks, fixed.

@kirangadhave kirangadhave left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting a major change which should remove hard dependency on polars for other backends which support structs.

Also please address other nits.

expanded.index = result_df.index
return undo(nw.from_native(result_df.join(expanded)))

DependencyManager.polars.require(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non pandas dataframes are forced to go through polars conversion here. For duckdb or ibis, that means forcing polars installation. We should not do that.

Narwhals has struct.field, so we can do:

schema = df.collect_schema()
fields = [f.name for f in schema[col].fields]
df.with_columns(
    [nw.col(col).struct.field(f).alias(f) for f in fields]
).drop(col)

This approach also stays lazy. Pandas approach with json_serialize is correct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that all backends, which support struct schema would necessarily have struct.field?

# older versions of pandas running on py310 otherwise CI will fail
expanded = pd.json_normalize(
result_df.pop(transform.column_id).map(
lambda value: {} if value is None else value

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check for NaN here

import pandas as pd

result_df = native_df.copy()
# max_level=0 was used so that pandas doesn't recursively unnest dicts

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment is narrating the code, simplify to explain the why instead.

max_level=0,
)
expanded.index = result_df.index
return undo(nw.from_native(result_df.join(expanded)))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate column names after unnest will throw error here. Handle it gracefully.

why="to expand dict/struct columns for non-pandas backends"
)
polars_df = collected_df.to_polars()
unnested = polars_df.unnest(transform.column_id)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate column names after unnest will throw error here. Handle it gracefully.

pd = pytest.importorskip("pandas")
pytest.importorskip("polars")
pytest.importorskip("pyarrow")
pytest.importorskip("polars")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import order change is unnecessary

Shamik-07 added 2 commits July 3, 2026 11:17
feat: raising duplicate columns error
test: adding necessary tests for duplicate columns
@Shamik-07

Copy link
Copy Markdown
Contributor Author

@kirangadhave

Thanks for the review comments.
I have addressed your review comments.
I still have a question, are you sure that all backends, which support struct schema would necessarily have struct.field?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New dataframe transform: Polars Native Expand Dict Transformation

4 participants