-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to compare DataFrames #34
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Not sure what's up with the CI. I'll take a look later. But this passes for me locally. If you could just make that one change I'll get this merged.
engarde/checks.py
Outdated
__all__ = ['is_monotonic', 'is_shape', 'none_missing', 'unique_index', 'within_n_std', | ||
'within_range', 'within_set', 'has_dtypes', | ||
'verify', 'verify_all', 'verify_any'] | ||
def is_same_as(df, df_to_compare): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you accept **kwargs
here and pass them through to the tm.assert_frame_equal
? That can be useful for relaxing the equality requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should I acknowledge kwargs
in the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
**kwargs : dict
keyword arguments passed through to ``assert_frame_equal``
engarde/checks.py
Outdated
|
||
""" | ||
try: | ||
tm.assert_frame_equal(df, df_to_compare) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll have to pass through **kwargs
here to tm.assert_frame_equal
:) This lets you do things like
df.pipe(is_same_as, other, check_names=False)
engarde/checks.py
Outdated
try: | ||
tm.assert_frame_equal(df, df_to_compare) | ||
except AssertionError as e: | ||
raise AssertionError("DataFrames are not equal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to include the original assertion error for why they aren't equal here. I know in python3 it's just
raise AseertionError("DataFrames are not equal") from e
but I don't know about py2. I think leave it for now.
Are you using python3 or 2? I'm tempted to just make the next release python3 only.
engarde/decorators.py
Outdated
def is_same_as(df_to_compare, **kwargs): | ||
def decorate(func): | ||
@wraps(func) | ||
def wrapper(*args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, this is a bit complicated. Can you rename the outer is_same_as
**kwargs
to **assert_kwargs
and then pass those to ck.is_same_as
on line 170? Then the wrapper
**kwargs
are for whatever func
is being decorated (like you have now).
I got the other errors fixed with tests around **kwargs functionality to ensure that less strict DataFrames match (float vs int). I dug into re-raising exceptions and found this post on StackOverflow. Reading thru the referenced blogs (1, 2), we can provide better tracebacks using a three-argument raise statement for python2. Personally, I'm using Python3 (#Teamfstrings). I do not have much experience in developing for both. Would it be difficult to implement two different exception re-raising workflows? How do other libraries handle this? There are definitely good reasons to keep testing libraries like this compatible with both py2 and py3. I can help out if you decide to go this route. For now, I amended my last commit to fix **kwargs usage, added better tests around that functionality, and kept the |
This looks good. I'll figure out the py2 vs. py3 stuff later. Hopefully will do a release in a couple weeks. |
I'm listening to Hynek's talk from PyCon while I'm working. He mentioned the This should fix all exception chaining issues. I'll update my code when I get some time this weekend. |
Added functionality to compare DataFrames by leveraging
pandas.util.testing
assert helpers