ENH: Introduce `pandas.col` #62103

MarcoGorelli · 2025-08-13T20:16:39Z

xref @jbrockmendel 's comment #56499 (comment)

I'd also discussed this with @phofl , @WillAyd , and @jorisvandenbossche (who originally showed us something like this in Basel at euroscipy 2023)

Demo:

import pandas as pd
from datetime import datetime

df = pd.DataFrame(
    {
        "a": [1, -2, 3],
        "b": [4, 5, 6],
        "c": [datetime(2020, 1, 1), datetime(2025, 4, 2), datetime(2026, 12, 3)],
        "d": ["fox", "beluga", "narwhal"],
    }
)

result = df.assign(
    # The usual Series methods are supported
    a_abs=pd.col("a").abs(),
    # And can be combined
    a_centered=pd.col("a") - pd.col("a").mean(),
    a_plus_b=pd.col("a") + pd.col("b"),
    # Namespace are supported too
    c_year=pd.col("c").dt.year,
    c_month_name=pd.col("c").dt.strftime("%B"),
    d_upper=pd.col("d").str.upper(),
).loc[pd.col("a_abs") > 1]  # This works in `loc` too

print(result)

Output:

   a  b          c        d  a_abs  a_centered  a_plus_b  c_year c_month_name  d_upper
1 -2  5 2025-04-02   beluga      2   -2.666667         3    2025        April   BELUGA
2  3  6 2026-12-03  narwhal      3    2.333333         9    2026     December  NARWHAL

NumPy ufuncs are also supported:

In [6]: df.assign(a_log = np.log(pd.col('a')))
Out[6]: 
   a     a_log
0  1  0.000000
1  2  0.693147
2  3  1.098612

Expressions also get pretty-printed, demo:

In [4]: pd.col('value')
Out[4]: col('value')

In [5]: pd.col('value') * pd.col('weight')
Out[5]: (col('value') * col('weight'))

In [6]: (pd.col('value') - pd.col('value').mean()) / pd.col('value').std()
Out[6]: ((col('value') - col('value').mean()) / col('value').std())

In [7]: pd.col('timestamp').dt.strftime('%B')
Out[7]: col('timestamp').dt.strftime('%B')

What's here should be enough for it to be usable. For the type hints to show up correctly, extra work should be done in pandas-stubs. But, I think it should be possible to develop tooling to automate the Expr docs and types based on the Series ones (going to cc @Dr-Irv here too then)

As for the "col" name, that's what PySpark, Polars, Daft, and Datafusion use, so I think it'd make sense to follow the convention

~~I'm opening as a request for comments. Would people want this API to be part of pandas?~~ This is ready for review

One of my main motivations for introducing it is that it avoids common issues with scoping. For example, if you use assign to increment two columns' values by 10 and try to write df.assign(**{col: lambda df: df[col] + 10 for col in ('a', 'b')}) then you'll be in for a big surprise

In [19]: df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

In [20]: df.assign(**{col: lambda df: df[col] + 10 for col in ('a', 'b')})
Out[20]:
    a   b
0  14  14
1  15  15
2  16  16

whereas with pd.col, you get what you were probably expecting:

In [4]: df.assign(**{col: pd.col(col) + 10 for col in ('a', 'b')})
Out[4]: 
    a   b
0  11  14
1  12  15
2  13  16

Further advantages:

expressions are introspectable so the repr can be made to look nice, whereas an anonymous lambda is always going to look something like <function __main__.<lambda>(df)
the syntax looks more modern and more aligned with modern tools

Expected objections:

this expands the pandas API even further. Sure, I don't disagree, but I think this is a common enough and longstanding enough request that it's worth expanding it for this

TODO:

tests, API docs, user guide. But first, I just wanted to get a feel for people's thoughts, and to see if anyone's opposed to it

Potential follow-ups (if there's interest):

serialise / deserialise expressions

Dr-Irv · 2025-08-13T20:57:19Z

For the type hints to show up correctly, extra work should be done in pandas-stubs. But, I think it should be possible to develop tooling to automate the Expr docs and types based on the Series ones (going to cc @Dr-Irv here too then)

When this is added, and then released, pandas-stubs can be updated with proper stubs.

One comment is that I'm not sure it will support some basic arithmetic, such as:

result = df.assign(addcon=pd.col("a") + 10)

Or alignment with other series:

b = df["b"]  # or this could be from a different DF
result = df.assign(add2=pd.col("a") + b)

Also, don't you need to add some tests??

MarcoGorelli · 2025-08-13T21:36:31Z

Thanks for taking a look!

One comment is that I'm not sure it will support some basic arithmetic [...] Or alignment with other series:

Yup, they're both supported:

In [8]: df = pd.DataFrame({'a': [1,2,3]})

In [9]: s = pd.Series([90,100,110], index=[2,1,0])

In [10]: df.assign(
    ...:     b=pd.col('a')+10,
    ...:     c=pd.col('a')+s,
    ...: )
Out[10]: 
   a   b    c
0  1  11  111
1  2  12  102
2  3  13   93

Also, don't you need to add some tests??

😄 Definitely, I just wanted to test the waters first, as I think this would be perceived as a significant API change

Dr-Irv · 2025-08-13T21:42:34Z

Definitely, I just wanted to test the waters first, as I think this would be perceived as a significant API change

I don't see it as a "change", more like an addition to the API that makes it easier to use. The existing way of using df.assign(foo=lambda df: df["a"] + df["b"]) would still work, but df.assign(foo=pd.col("a") + pd.col("b")) is cleaner.

pandas/core/col.py

jbrockmendel · 2025-08-14T04:06:06Z

Is assign the main use case?

MarcoGorelli · 2025-08-14T07:34:39Z

Currently it would only work in places that accept DataFrame -> Series callables which, as far as I know, is only DataFrame.assign and filtering with DataFrame.loc

Getting it to work in GroupBy.agg is more complex, but it is possible, albeit with some restrictions

MarcoGorelli · 2025-08-15T17:22:11Z

I haven't seen any objections, so I'll work on adding docs + user guide + tests

If anyone intends to block this then I'd appreciate it if you could speak out as soon as possible (also going to cc @mroeschke here in case you were against this)

mroeschke · 2025-08-15T17:44:24Z

I would be OK adding this API.

pandas/core/col.py

Dr-Irv

Just a few questions - and I caught one typo in the docs

pandas/core/col.py

Dr-Irv

caught one issue in the docs. I'm good with this. Will hope that others will review/approve/merge.

pandas/core/col.py

Co-authored-by: Irv Lustig <[email protected]>

Dr-Irv

will let someone else review/approve/merge .

rhshadrach · 2025-08-18T21:36:52Z

Haven't looked at the implementation, but big +1 from me.

…as-col

Dr-Irv · 2025-08-19T18:06:50Z

pandas/tests/test_col.py

@@ -31,6 +32,8 @@
        (pd.col("a") < 1, [False, False], "(col('a') < 1)"),
        (pd.col("a") <= 1, [True, False], "(col('a') <= 1)"),
        (pd.col("a") == 1, [True, False], "(col('a') == 1)"),
+        (np.log(pd.col("a")), [0.0, 0.6931471805599453], "log(col('a'))"),


log is dangerous to use in a test because the floating point value could be different on different platforms.

Maybe use np.min() instead.

sure, thanks! have replaced with np.power, so then we also test passing in a non-expression argument (2)

doc/source/whatsnew/v3.0.0.rst

pandas/core/col.py

mroeschke

Great! LGTM

Dr-Irv · 2025-08-19T18:54:15Z

pandas/core/col.py

+            max_cols = 10
+            if len(columns_list) > max_cols:
+                columns_hint = columns_list[:max_cols] + ["..."]
+            else:
+                columns_hint = columns_list


Might want to base this on the length of the column names. If the column names were all of length 15 on average, you'd have a very long message.

sure, done, thanks (using a totally arbitrary limit of 90, but i think it looks good)

mroeschke · 2025-08-22T16:51:03Z

Thanks @MarcoGorelli

jbrockmendel · 2025-08-22T16:54:51Z

Very nice @MarcoGorelli

ENH: Introduce pandas.col

3d17e56

MarcoGorelli changed the title ~~ENH: Introduce pandas.col~~ RFC: Introduce pandas.col Aug 13, 2025

MarcoGorelli mentioned this pull request Aug 13, 2025

ENH: pandas mutate, add R's mutate functionality to enable users to easily create new columns in data frames #56499

Open

3 tasks

api test, typing

9fcaba3

dangotbanned reviewed Aug 13, 2025

View reviewed changes

pandas/core/col.py Outdated Show resolved Hide resolved

typing

b41b99d

MarcoGorelli force-pushed the pandas-col branch from 628a3b0 to b41b99d Compare August 14, 2025 09:47

MarcoGorelli marked this pull request as ready for review August 14, 2025 10:09

add pretty repr

60c09c2

mroeschke reviewed Aug 15, 2025

View reviewed changes

pandas/core/col.py Outdated Show resolved Hide resolved

MarcoGorelli added 10 commits August 16, 2025 15:52

improve error message

9e4e0c5

test repr

fe78aa2

test namespaces

04044af

docs

a95aeb4

reference in dsintro

4dc8e55

Merge remote-tracking branch 'upstream/main' into pandas-col

13d8e5c

fixup link

e2aeb4f

fixup docs

fa3e793

fixup

0bc918a

add test file

a0939f9

Dr-Irv reviewed Aug 17, 2025

View reviewed changes

pandas/core/col.py Outdated Show resolved Hide resolved

pandas/core/col.py Outdated Show resolved Hide resolved

pandas/core/col.py Outdated Show resolved Hide resolved

MarcoGorelli added 2 commits August 17, 2025 10:07

simplify, support custom series extensions too

a703982

test accessor

48228cc

MarcoGorelli added 3 commits August 17, 2025 19:34

move Expr to api.typing

c8f0193

move Expr to api/typing

e6ea343

rename Expr to Expression

96990d6

Dr-Irv reviewed Aug 17, 2025

View reviewed changes

pandas/core/col.py Outdated Show resolved Hide resolved

fix return type

548ee20

Co-authored-by: Irv Lustig <[email protected]>

Dr-Irv approved these changes Aug 17, 2025

View reviewed changes

MarcoGorelli mentioned this pull request Aug 19, 2025

ENH: LogicalSlice object #40740

Closed

MarcoGorelli added 4 commits August 19, 2025 18:39

support NumPy ufuncs

cfbd5a3

support NumPy ufuncs too

e74438c

Merge remote-tracking branch 'upstream/main' into pandas-col

b4de244

Merge branch 'pandas-col' of github.com:MarcoGorelli/pandas into pand…

31192e0

…as-col

MarcoGorelli mentioned this pull request Aug 19, 2025

API: port the magic X from pandas_ply/dplython to pandas proper? #13133

Closed

simplify repr_str type

83b70e8

Dr-Irv reviewed Aug 19, 2025

View reviewed changes

fix typing, avoid floating point inaccuracies

3b6906b

mroeschke reviewed Aug 19, 2025

View reviewed changes

doc/source/whatsnew/v3.0.0.rst Show resolved Hide resolved

add to api reference

9fed80e

mroeschke reviewed Aug 19, 2025

View reviewed changes

pandas/core/col.py Outdated Show resolved Hide resolved

mroeschke approved these changes Aug 19, 2025

View reviewed changes

MarcoGorelli added 2 commits August 19, 2025 19:42

truncate output for wide dataframes

edb0e38

make max_cols variable

72faba9

Dr-Irv reviewed Aug 19, 2025

View reviewed changes

MarcoGorelli added 2 commits August 19, 2025 20:04

truncate based on message length rather than number of columns

b6f4961

fixup docstring

3791cf6

mroeschke merged commit 188b2da into pandas-dev:main Aug 22, 2025
38 checks passed

mroeschke added this to the 3.0 milestone Aug 22, 2025

mroeschke added the Enhancement label Aug 22, 2025

Uh oh!

ENH: Introduce pandas.col #62103

ENH: Introduce pandas.col #62103

Conversation

MarcoGorelli commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dr-Irv commented Aug 13, 2025

Uh oh!

MarcoGorelli commented Aug 13, 2025

Uh oh!

Dr-Irv commented Aug 13, 2025

Uh oh!

Uh oh!

jbrockmendel commented Aug 14, 2025

Uh oh!

MarcoGorelli commented Aug 14, 2025

Uh oh!

MarcoGorelli commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mroeschke commented Aug 15, 2025

Uh oh!

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach commented Aug 18, 2025

Uh oh!

Dr-Irv Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

Dr-Irv Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mroeschke commented Aug 22, 2025

Uh oh!

jbrockmendel commented Aug 22, 2025

Uh oh!

Uh oh!

ENH: Introduce `pandas.col` #62103

ENH: Introduce `pandas.col` #62103

MarcoGorelli commented Aug 13, 2025 •

edited

Loading

MarcoGorelli commented Aug 15, 2025 •

edited

Loading

MarcoGorelli Aug 19, 2025 •

edited

Loading

MarcoGorelli Aug 19, 2025 •

edited

Loading