Skip to content

GH-36845: [C++][Python] Allow type promotion on pa.concat_tables#36846

Merged
jorisvandenbossche merged 93 commits intoapache:mainfrom
Fokko:arrow-14705
Oct 10, 2023
Merged

GH-36845: [C++][Python] Allow type promotion on pa.concat_tables#36846
jorisvandenbossche merged 93 commits intoapache:mainfrom
Fokko:arrow-14705

Conversation

@Fokko
Copy link
Contributor

@Fokko Fokko commented Jul 24, 2023

Revival of #12000

Rationale for this change

It would be great to be able to do promotions when concat'ing a table, such as:

def test_concat_tables_with_promotion_int():
    import pyarrow as pa
    t1 = pa.Table.from_arrays(
        [pa.array([1, 2], type=pa.int64())], ["int"])
    t2 = pa.Table.from_arrays(
        [pa.array([3, 4], type=pa.int32())], ["int"])

    result = pa.concat_tables([t1, t2], promote=True)

    assert result.equals(pa.Table.from_arrays([
        pa.array([1, 2, 3, 4], type=pa.int64())
    ], ["int"]))

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@Fokko Fokko requested a review from westonpace as a code owner July 24, 2023 14:19
@github-actions
Copy link

⚠️ GitHub issue #36845 has been automatically assigned in GitHub to PR creator.

@Fokko Fokko changed the title GH-36845: [ GH-36845: [C++][Python] Allow type promotion on pa.concat_tables Jul 24, 2023
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the updates!

Added a few small comments (I can also push if needed to ensure we get this in).
It might be good to add a small test to ensure promote keyword still works.

Concatenate pyarrow.Table objects.

If promote==False, a zero-copy concatenation will be performed. The schemas
If promote_options=="none", a zero-copy concatenation will be performed. The schemas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If promote_options=="none", a zero-copy concatenation will be performed. The schemas
If promote_options="none", a zero-copy concatenation will be performed. The schemas

first table.

If promote==True, any null type arrays will be casted to the type of other
If promote_options=="default", any null type arrays will be casted to the type of other
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If promote_options=="default", any null type arrays will be casted to the type of other
If promote_options="default", any null type arrays will be casted to the type of other


if "promote" in kwargs:
warnings.warn(
"promote has been superseded by mode='default'.", FutureWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"promote has been superseded by mode='default'.", FutureWarning)
"promote has been superseded by mode='default'.", FutureWarning, stacklevel=2)

Comment on lines +3179 to +3180
Default; null and only null can be unified with another type.
Permissive; promotes types to the greater common denominator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Default; null and only null can be unified with another type.
Permissive; promotes types to the greater common denominator.
Default: null and only null can be unified with another type.
Permissive: types are promoted to the greater common denominator.

warnings.warn(
"promote has been superseded by mode='default'.", FutureWarning)
if kwargs['promote'] is True:
promote_options = "permissive"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
promote_options = "permissive"
promote_options = "default"

Isn't that needed to preserve the current default behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we agreed on promoting when promote=True, but this also works for me.

Copy link
Member

@jorisvandenbossche jorisvandenbossche Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mentioned that I would find a promote=True a better default, but now that we deprecated it in favor of promote_options, we don't need to change the current behaviour, as it will be removed anyway. And users can now replace it with promote_options="permissive" if they prefer that

@jorisvandenbossche jorisvandenbossche added this to the 14.0.0 milestone Oct 6, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 6, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 6, 2023
t2 = pa.Table.from_arrays(
[pa.array([1.0, 2.0], type=pa.float32())], ["float_field"])

result = pa.concat_tables([t1, t2], promote=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = pa.concat_tables([t1, t2], promote=True)
with pytest.warns(FutureWarning):
result = pa.concat_tables([t1, t2], promote=True)

This asserts the warning is raised and at the same time also ensures we don't unnecessarily see the warning in the pytest logs

@github-actions github-actions bot removed the awaiting change review Awaiting change review label Oct 9, 2023
@jorisvandenbossche
Copy link
Member

Thanks @Fokko! (and @lidavidm for the initial PR)

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 5f57219.

There were 2 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them.

@jorisvandenbossche
Copy link
Member

FYI, those reported performance regressions were just flakes. The timings are still stable at the same level for later commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Allow promotion from int32 to int64

5 participants