-
Notifications
You must be signed in to change notification settings - Fork 30
HDF5
This is taken from the original ASDF paper and thus may be a bit dated.
This is a very flexible format capable of handling many storage options and needs (Folk et al., 2011). It is used by a much broader community than astronomy and is the strongest alternative candidate to FITS of existing data formats. It is already being used by some astronomical projects (e.g., LOFAR and HDF5: Toward a New Radio Data Standard. We summarize the drawbacks of HDF5 below, partly by indicating aspects of FITS that were good and still worth preserving, though in a different form.
-
It is an entirely binary format. FITS headers are easily human-readable. Nothing like that is the case for HDF5 files. All inspection of HDF5 files must be done through HDF5 software. With FITS, it is quite possible to inspect the header with very primitive tools. The consequence of this for HDF5 is that the HDF5 toolset must be installed, and its software must be used if one wants to inspect the contents of the HDF5 file in any way.
-
The FITS format is, to a large degree, self documenting. If one removed all of the standards documentation, one could reasonably infer the organization of the contents without a great deal of trouble (though the 2880 blocking, and compression options would both present some challenges, particularly the latter). The same cannot be said for HDF5. One is essentially lost without the HDF5 specification documentation, which is lengthy and complex (approximately 125 pages).
-
Because of the complexity, there is effectively only one implementation. The drawback of having only one implementation is that it may deviate from the published specification (who would know since there is no independent verification?). It is true that there is a reference set of test data; nevertheless, this does not guard against practical deviations from the specification. Admittedly, multiple implementations do not remove the possibility completely, but they do significantly reduce the likelihood.
-
A related issue is that for some time the HDF format was not considered archival as it kept changing, and for a time it was considered more of a software API than a specific representation on disk. HDF5 has been relatively stable, though given the lack of multiple implementations and self documenting nature makes it less appropriate as an archival format. Will the future library be able to read much older files? FITS has been considered a much stronger archival format for this reason.
-
HDF5 does not lend itself to supporting simpler, smaller text-based data files. As an example, many astronomers prefer to use simple ASCII tables for data that do not require very large files, primarily for the convenience in viewing and editing them without using special tools.
-
The HDF5 Abstract Data Model is not flexible enough to represent the structures we need to represent, notably for generalized WCS (see Section 6.6 of ASDF: A new data format for astronomy). The set of data types in HDF5 does not include a variable-length mapping datatype (analogous to a Python dictionary or JavaScript object). While “Groups”, which are much like a filesystem directory, could be used for this purpose, “Groups” cannot be nested inside of variable-length arrays but only within each other. The “Compound” data type, analogous to a C struct also seems fruitful, but it cannot contain other “Compound” types or variable-length arrays. These arbitrary restrictions on nesting of data structures make some concepts much harder to represent than they otherwise need to be.