Skip to content

Comments

Make HGroup more robust to changes in h5py.Group#5284

Open
GarethCabournDavies wants to merge 7 commits intogwastro:masterfrom
GarethCabournDavies:update_HFile
Open

Make HGroup more robust to changes in h5py.Group#5284
GarethCabournDavies wants to merge 7 commits intogwastro:masterfrom
GarethCabournDavies:update_HFile

Conversation

@GarethCabournDavies
Copy link
Contributor

There were some changes to h5py's create_dataset which I didn't see for a while. This change will make HGroup more robust to changes in h5py.Group as a result

Ive extended the checking for which situations need flecther32 checksums enabled, I dont think this is necessary for the situations we have now, but it is for safety in the future.

I've also added a couple of unit testing modules (largely written by copilot to be fair) which tests many different methods of creation, whether the datasets are the same after being loaded etc., and whether fletcher32 is enabled or not. The wider hdf.py checks are for things like the bool and indexing selections.

Standard information about the request

This change touches all areas which use h5py for saving files (which is everywhere), but actually affects nothing

This change: has appropriate unit tests, follows style guidelines (See e.g. PEP8), has been proposed using the contribution guidelines

Motivation

I didn't want to rely on me getting automated emails of whether a file has changed any more (by the evidence already, this doesn't work)

Links to any issues or associated PRs

extends #4831

Testing performed

unit tests added so that the testing is done automatically

  • The author of this pull request confirms they will adhere to the code of conduct

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates pycbc.io.hdf.HGroup/HFile wrappers to be more resilient to upstream h5py.Group behavior changes, particularly around create_dataset and enforcing fletcher32 checksums, and adds new unit tests covering dataset creation parity, HGroup wrapping, and several pycbc.io.hdf utilities.

Changes:

  • Refactor HGroup.create_group to delegate to h5py and always return an HGroup.
  • Rework HGroup.create_dataset to auto-enable fletcher32 for numeric, non-scalar datasets and add wrapping behavior for __getitem__ and parent.
  • Add new test modules for checksum behavior, HGroup wrapping, HFile.select, FileData, DictArray, and DataFromFiles.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
pycbc/io/hdf.py Refactors HGroup wrappers for group/dataset creation and subgroup access, and adds parent wrapping.
test/test_io_hgroup.py Adds tests for fletcher32 enablement rules and HGroup wrapping semantics.
test/test_io_hdf.py Adds tests for HFile.select, FileData, DictArray.save, and DataFromFiles.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@ahnitz ahnitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've addressed my primary concern. Also, thank you for the unittest (even if they may be AI assisted). They look reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code-quality Improvements to code quality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants