Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Supporting pathlib's Path objects in FileDataStream #377

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pnshinde
Copy link

Fixes #269 . pathlib's Path objects can be converted to strings just by casting, and vice versa. I added a check in FileDataStream's init function to convert a Path object to a string. I also wrote a test for this, but calling FileDataStream.read_csv() on a Path object produces the following error (using tool=None/'pandas'):

Screen Shot 2019-11-28 at 7 34 09 PM

@ganik do I need to define my own schema for a file path as a Path object, even though the contents of the file should be the same? I'm a little stuck here and any help would be appreciated!

@msftclas
Copy link

msftclas commented Nov 29, 2019

CLA assistant check
All CLA requirements met.

@pieths
Copy link
Collaborator

pieths commented Dec 2, 2019

Will this work with Python 2.7? It looks like pathlib was added in version 3.4.

with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
df.to_csv(f, sep=',', index=False)

fi = FileDataStream.read_csv(Path(f.name), sep=',')
Copy link
Collaborator

@pieths pieths Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test is failing here. Looks like a similar change might also be required in internal\utils\data_schema.py (in read_schema(*data, **options)).

I haven't tested it but maybe updating the part of the code around line 844 might fix the issue:

elif hasattr(X, 'read') or isinstance(X, str) or (
                    six.PY2 and isinstance(X, (str, text_type))):

@pnshinde
Copy link
Author

pnshinde commented Dec 8, 2019

It looks like you're right, and that Path objects won't work with python 2.7. I tried several different things to try and get it to work but with no luck. Do you have any suggestions on how to proceed? What if we checked the system python version and used Path objects only when that value is greater than 3.4?

@pieths
Copy link
Collaborator

pieths commented Dec 12, 2019

It looks like you're right, and that Path objects won't work with python 2.7. I tried several different things to try and get it to work but with no luck. Do you have any suggestions on how to proceed? What if we checked the system python version and used Path objects only when that value is greater than 3.4?

Checking for a particular Python version is done in some other parts of the code (including the snippet in my previous comment). Search for six.PY2 in the code base for more examples.

@ianlini
Copy link

ianlini commented Jan 10, 2020

  1. For Python < 3.4, you can use pathlib2, but I don't think that is the best practice. BTW, I think Path.resolve() should be called by the user, but not by the library.

  2. According to PEP 519, you can implement a function like os.fspath (added in Python 3.6) by path.__fspath__() if hasattr(path, "__fspath__") else path. The definition of path-like object is actually:

    A path-like object is either a str or bytes object representing a path, or an object implementing the os.PathLike protocol.

    so we shouldn't use pathlib.Path to determine whether an object is path-like.

  3. I'm not sure what the best practice is for testing. Maybe you can go with pathlib2, or implement a class with __fspath__.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support pathlib in FileDataStream
4 participants