Skip to content

Conversation

@ntjohnson1
Copy link

@ntjohnson1 ntjohnson1 commented Dec 29, 2025

Which issue does this PR close?

Rationale for this change

Explanation in the issue. Motivation coming more concretely from datafusion-python apache/datafusion-python#1305 (comment)

What changes are included in this PR?

  • Adds the test from the issue to highlight expected behavior
  • Expands drop_columns to coerce things into a fully qualified column to support the range of column varieties
  • This further adds a helper to extract the table name associated with the dataframe to simplify use of qualified drop columns support
    • This is potentially the most controversial part. I could see a nicer api being df.col(<name>) to match the expr version but then we probably do repeated checks for the underlying table name unless there is some caching somewhere. Maybe that performance impact isn't significant.

Are these changes tested?

Yes some additional tests are provided.

Are there any user-facing changes?

I had to update the drop_columns(&[]) test since the type can no longer be inferred. I'm not sure if that is representative of any actual use cases though since I expect the more common is a vector that might be empty in which case the type would be specified.

It now requires specifying columns with dots in them similar to other places "\"f.col1\"" to disambiguate from "f.col1".

@github-actions github-actions bot added core Core DataFusion crate functions Changes to functions implementation labels Dec 29, 2025
@renato2099
Copy link
Contributor

now we are changing/improving the behavior of drop_columns, we should probably update the documentation (wherever the right place is?). I mean after this PR drop_columns now uses find_qualified_columns so now:

  • Qualified names are supported
  • Missing columns now error instead of silently failing

Copy link
Contributor

@renato2099 renato2099 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my only request/comment would be to find the place where we can update the documentation for the drop_column change, thanks!

.map(|name| {
schema
.qualified_field_from_column(&Column::from_name(*name))
.map_err(|_| plan_datafusion_err!("Column '{}' not found", name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better than silently ignoring the error 👍

@ntjohnson1
Copy link
Author

now we are changing/improving the behavior of drop_columns, we should probably update the documentation (wherever the right place is?). I mean after this PR drop_columns now uses find_qualified_columns so now:

  • Qualified names are supported
  • Missing columns now error instead of silently failing

So the drop_columns documentation actually doesn't specify anything about qualified or unqualified names at the moment

/// Returns a new DataFrame containing all columns except the specified columns.
which was my original issue. IMO this makes the implementation match the existing documentation.

Also just clarification that drop_columns doesn't actually call find_qualified_columns. Instead find_qualified_columns is a public helper that makes it easier to call drop_columns with qualified names if you don't remember the table name for a dataframe but have the variable for that dataframe lying around (I added a test for this use case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Drop columns only allows unqualified names and silently fails otherwise

2 participants