Consolidating/streamlining object-based and class-based pandera API #643
Replies: 2 comments 1 reply
-
Hi @cosmicBboy. I haven't had the chance to think carefully about it yet, I'll try to give a proper reply in the coming days. Anyway, my first thoughts are:
We could throw an error if dtype is given in a SchemaModel, so the users would quickly learn not to do that mistake
This scenario is not very common. I think the first positional argument should be given to |
Beta Was this translation helpful? Give feedback.
-
[coming from #839, pretty much just throwing in a bunch of new thoughts without giving concrete feedback on the above, sorry!] IDK if either of you have every looked at sqlalchemy, but they look to be solving a very similar problem. I think they have a very nice method of both a declarative and imperative API to create a schema, but no matter your method of creation, you always end up with a class that represents the schema. They do some clever stuff where the imperative API actually returns a class, not an object, and when you use the declarative syntax, your class actually subclasses from a dynamically created base class. See the docs for a sense. Installing sqlalchemy and playing around with it could be a good test to see how well you could possibly integrate with mypy, IDE autcompletion, etc. The things I really like about this is:
Things to note about this method:
In the linked issue I'm coming from, @cosmicBboy said:
I don't have the background for this, sorry. I'm assuming I'm missing something when I ask what is getting in the way of us having both benefits? Maybe the sqlalchemy approach to typing that I linked above might be a lead? |
Beta Was this translation helpful? Give feedback.
-
@jeffzi wanted to ping you on this question, and I wanted to discuss on here before turning it into a full-blown issue.
There are a few things I've been thinking about re: usability and wanted to list them off here under the broader theme of consolidating and streamlining the pandera API:
1. Support datatypes in SchemaModel
Make it so that users don't have to explicitly specify
pa.typing.Series
orpa.typing.Index
. It would also support pandera datatypes.2. Support
pa.Field
inDataFrameSchema
index
kwargdtype
kwarg toField
, only to be specified for the object-based API. This would perhaps cause some confusion both to users and contributors, and muddy the distinction betweenSchemaModel
components andDataFrameSchema
object components. I've also wanted to preserve the first positional arg ofField
todefault
, with similar semantics to the dataclass or pydantic implementation, except it would fill nan values with the default.These changes should be designed to be backwards compatible, and I think it would be pretty straight-forward to implement.
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions