Skip to content

Conversation

@ilumsden
Copy link
Collaborator

@ilumsden ilumsden commented Feb 9, 2022

Follow up to hatchet/hatchet#272

This PR adds the following new functions for checkpointing GraphFrames (i.e., saving to/reading from files):

  1. to_pickle and from_pickle (Pickle Format)
  2. to_csv and from_csv
  3. to_excel and from_excel
    These functions utilize similar read/write functions from Pandas. In many cases, these Pandas functions require additional dependencies. Those dependencies will not be required in Hatchet. If the dependency for a particular function is not installed, Pandas will raise an ImportError.

This PR also adds new save and load functions to the GraphFrame class. These functions can be used to simplify the use of checkpointing. Both of these functions only require one argument: the filename. If the filename contains a recognized extension, that format will be used. Otherwise, the optional fileformat parameter can be provided to specify the desired format. If the necessary dependencies are not installed, the ImportError raised by Pandas will be caught. In that case, all remaining formats will be attempted. If no supported format succeeds, an IOError will be raised.

All the new functions added in this PR accepts keyword arguments (i.e., **kwargs). These arguments will be passed to the Pandas function that is eventually invoked to read/write the file. Documentation (i.e., docstrings) will be added that will link to the associated functions' documentation.

Other file formats (e.g., Parquet and Feather) will be added in future PRs.

@ilumsden ilumsden added area-readers Issues and PRs involving Hatchet's data readers area-writers Issues and PRs involving Hatchet's data writers priority-normal Normal priority issues and PRs status-work-in-progress PR is currently being worked on type-feature Requests for new features or PRs which implement new features labels Feb 9, 2022
@ilumsden ilumsden self-assigned this Feb 9, 2022
@ilumsden
Copy link
Collaborator Author

ilumsden commented Feb 9, 2022

Originally from hatchet/hatchet on May 18, 2021

I might wait until hatchet/hatchet#377 is merged before marking this PR ready-for-review. This PR adds some global configuration type data to all the save and load functions to determine the file format to use based on file extension. If this data was placed in the global configuration system, user's would be able to add "rules" telling those functions to save/load files with non-standard extensions using a certain file format.

@ilumsden
Copy link
Collaborator Author

ilumsden commented Feb 9, 2022

Originally from May 22, 2021:

Implementation and testing is now complete. This PR depends on hatchet/hatchet#272, so it definitely shouldn't be reviewed or merged until hatchet/hatchet#272 is merged. I also want to integrate hatchet/hatchet#377, but I might do that in a separate PR.

@slabasan slabasan force-pushed the develop branch 12 times, most recently from 74d7f3e to 837e5e3 Compare August 9, 2022 04:48
@slabasan slabasan force-pushed the develop branch 5 times, most recently from b461833 to 48d44ce Compare August 9, 2022 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-readers Issues and PRs involving Hatchet's data readers area-writers Issues and PRs involving Hatchet's data writers priority-normal Normal priority issues and PRs status-work-in-progress PR is currently being worked on type-feature Requests for new features or PRs which implement new features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant