Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to compare DataFrames #34

Merged
merged 2 commits into from
May 27, 2017

Conversation

alysivji
Copy link
Contributor

Added functionality to compare DataFrames by leveraging pandas.util.testing assert helpers

Copy link
Collaborator

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Not sure what's up with the CI. I'll take a look later. But this passes for me locally. If you could just make that one change I'll get this merged.

__all__ = ['is_monotonic', 'is_shape', 'none_missing', 'unique_index', 'within_n_std',
'within_range', 'within_set', 'has_dtypes',
'verify', 'verify_all', 'verify_any']
def is_same_as(df, df_to_compare):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you accept **kwargs here and pass them through to the tm.assert_frame_equal? That can be useful for relaxing the equality requirements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should I acknowledge kwargs in the docstring?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

**kwargs : dict
    keyword arguments passed through to ``assert_frame_equal``


"""
try:
tm.assert_frame_equal(df, df_to_compare)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll have to pass through **kwargs here to tm.assert_frame_equal :) This lets you do things like

df.pipe(is_same_as, other, check_names=False)

try:
tm.assert_frame_equal(df, df_to_compare)
except AssertionError as e:
raise AssertionError("DataFrames are not equal")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to include the original assertion error for why they aren't equal here. I know in python3 it's just

raise AseertionError("DataFrames are not equal") from e

but I don't know about py2. I think leave it for now.

Are you using python3 or 2? I'm tempted to just make the next release python3 only.

def is_same_as(df_to_compare, **kwargs):
def decorate(func):
@wraps(func)
def wrapper(*args, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, this is a bit complicated. Can you rename the outer is_same_as **kwargs to **assert_kwargs and then pass those to ck.is_same_as on line 170? Then the wrapper **kwargs are for whatever func is being decorated (like you have now).

@alysivji
Copy link
Contributor Author

I got the other errors fixed with tests around **kwargs functionality to ensure that less strict DataFrames match (float vs int).

I dug into re-raising exceptions and found this post on StackOverflow. Reading thru the referenced blogs (1, 2), we can provide better tracebacks using a three-argument raise statement for python2.

Personally, I'm using Python3 (#Teamfstrings). I do not have much experience in developing for both. Would it be difficult to implement two different exception re-raising workflows? How do other libraries handle this?

There are definitely good reasons to keep testing libraries like this compatible with both py2 and py3. I can help out if you decide to go this route.

For now, I amended my last commit to fix **kwargs usage, added better tests around that functionality, and kept the raise AssertionError("DataFrames are not equal") functionality the same.

@TomAugspurger
Copy link
Collaborator

This looks good. I'll figure out the py2 vs. py3 stuff later.

Hopefully will do a release in a couple weeks.

@TomAugspurger TomAugspurger merged commit 0033f57 into engarde-dev:master May 27, 2017
@alysivji
Copy link
Contributor Author

alysivji commented May 31, 2017

I'm listening to Hynek's talk from PyCon while I'm working. He mentioned the raise_from() method in the six library.

This should fix all exception chaining issues. I'll update my code when I get some time this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants