Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a scalar check for fill_value in shift()? #21280

Open
2 tasks done
etiennebacher opened this issue Feb 15, 2025 · 1 comment · May be fixed by #21292
Open
2 tasks done

Add a scalar check for fill_value in shift()? #21280

etiennebacher opened this issue Feb 15, 2025 · 1 comment · May be fixed by #21292
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@etiennebacher
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
dat = pl.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})

dat.shift(1, fill_value=pl.col("y"))
# shape: (3, 2)
# ┌─────┬─────┐
# │ x   ┆ y   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 4   ┆ 4   │
# │ 1   ┆ 4   │
# │ 2   ┆ 5   │
# └─────┴─────┘

dat.shift(-1, fill_value=pl.col("y"))
# shape: (3, 2)
# ┌─────┬─────┐
# │ x   ┆ y   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 2   ┆ 5   │
# │ 3   ┆ 6   │
# │ 4   ┆ 4   │
# └─────┴─────┘

Log output

/

Issue description

The behavior of the argument fill_value in shift() is hard to understand when fill_value refers to a column. In the example above, nulls created by the shift are filled with the first value of pl.col("y") (pre-shifting).

Expected behavior

It seems weird to pass pl.col("y") to fill values. Shouldn't there be a check that fill_value must be a scalar so that pl.col("y").first() works but pl.col("y") doesn't?

Installed versions

--------Version info---------
Polars:              1.22.0
Index type:          UInt32
Platform:            Linux-6.8.0-52-generic-x86_64-with-glibc2.39
Python:              3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0]
LTS CPU:             False

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
numpy                2.1.2
openpyxl             <not installed>
pandas               2.2.3
pyarrow              18.1.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@etiennebacher etiennebacher added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 15, 2025
@mcrumiller
Copy link
Contributor

Yes this is definitely odd--if you use -1 for example, it still uses the first value of y.

import polars as pl

df = pl.DataFrame({
    "x": [1, 2, None, 4, 5],
    "y": [4, 5, 6, 7, 8],
})
df.shift(-2, fill_value=pl.col("y"))
# shape: (5, 2)
# ┌──────┬─────┐
# │ x    ┆ y   │
# │ ---  ┆ --- │
# │ i64  ┆ i64 │
# ╞══════╪═════╡
# │ null ┆ 6   │
# │ 4    ┆ 7   │
# │ 5    ┆ 8   │
# │ 4    ┆ 4   │ <-- filled with y[0]
# │ 4    ┆ 4   │ <-- filled with y[0]
# └──────┴─────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants