HDF5 1.10 introduces several new features in the HDF5 library. These new features were added in the first three releases of HDF5-1.10. For a brief description of each new feature see:
- New Features Introduced in HDF5 1.10.2
- New Features Introduced in HDF5 1.10.1
- New Features Introduced in HDF5 1.10.0
This release includes changes in the HDF5 storage format. For detailed information on the changes, see: Changes to the File Format Specification
These changes come into play when one or more of the new features is used or when an application calls for use of the latest storage format (H5P_SET_LIBVER_BOUNDS).
Due to the requirements of some of the new features, the format of a 1.10.x HDF5 file is likely to be different from that of a 1.8.x HDF5 file. This means that tools and applications built to read 1.10.x files will be able to read a 1.8.x file, but tools built to read 1.8.x files may not be able to read a 1.10.x file.
If an application built on HDF5 Release 1.10 avoids use of the new features and does not request use of the latest format, applications built on HDF5 Release 1.8.x will be able to read files the first application created. In addition, applications originally written for use with HDF5 Release 1.8.x can be linked against a suitably configured HDF5 Release 1.10.x library, thus taking advantage of performance improvements in 1.10.
New Features Introduced in HDF5 1.10.2
Forward Compatibility for HDF5 1.8-based Applications Accessing Files Created by HDF5 1.10.2
In HDF5 1.8.0, the H5P_SET_LIBVER_BOUNDS function was introduced for specifying the earliest ("low") and latest ("high") versions of the library to use when writing objects. With HDF5 1.10.2, new values for "low" and "high" were introduced: H5F_LIBVER_18 and H5F_LIBVER_LATEST is now mapped to H5F_LIBVER_V110. See the H5P_SET_LIBVER_BOUNDS function for details.
Performance Optimizations for HDF5 Parallel Applications
Optimizations were introduced to parallel HDF5 for improving the performance of open, close and flush operations at scale.
Using Compression with HDF5 Parallel Applications
HDF5 parallel applications can now write data using compression (and other filters such as the Fletcher32 checksum filter).
New Features Introduced in HDF5 1.10.1
Metadata Cache Image ( RFC ) » Fine-tuning the Metadata Cache
HDF5 metadata is typically small, and scattered throughout the HDF5 file. This can affect performance, particularly on large HPC systems. The Metadata Cache Image feature can improve performance by writing the metadata cache in a single block on file close, and then populating the cache with the contents of this block on file open, thus avoiding the many small I/O operations that would otherwise be required on file open and close.
Metadata Cache Evict on Close » Fine-tuning the Metadata Cache
The HDF5 library's metadata cache is fairly conservative about holding on to HDF5 object metadata (object headers, chunk index structures, etc.), which can cause the cache size to grow, resulting in memory pressure on an application or system. The "evict on close" property will cause all metadata for an object to be evicted from the cache as long as metadata is not referenced from any other open object.
Paged Aggregation ( RFC ) » File Space Management
The current HDF5 file space allocation accumulates small pieces of metadata and raw data in aggregator blocks which are not page aligned and vary widely in sizes. The paged aggregation feature was implemented to provide efficient paged access of these small pieces of metadata and raw data.
Page Buffering ( RFC )
Small and random I/O accesses on parallel file systems result in poor performance for applications. Page buffering in conjunction with paged aggregation can improve performance by giving an application control of minimizing HDF5 I/O requests to a specific granularity and alignment.
New Features Introduced in HDF5 1.10.0
Data acquisition and computer modeling systems often need to analyze and visualize data while it is being written. It is not unusual, for example, for an application to produce results in the middle of a run that suggest some basic parameters be changed, sensors be adjusted, or the run be scrapped entirely.
To enable users to check on such systems, we have been developing a concurrent read/write file access pattern we call SWMR (pronounced swimmer). SWMR is short for single-writer/multiple-reader. SWMR functionality allows a writer process to add data to a file while multiple reader processes read from the file.
The orderly operation of the metadata cache is crucial to SWMR functioning. A number of APIs have been developed to handle the requests from writer and reader processes and to give applications the control of the metadata cache they might need. However, the metadata cache APIs can be used when SWMR is not being used; so, these functions are described separately.
Calls for HDF5 metadata can result in many small reads and writes. On metadata reads, collective metadata I/O can improve performance by allowing the library to perform optimizations when reading the metadata, by having one rank read the data and broadcasting it to all other ranks.
Collective metadata I/O improves metadata write performance through the construction of an MPI derived datatype that is then written collectively in a single call.
Usage patterns when working with an HDF5 file sometimes result in wasted space within the file. This can also impair access times when working with the resulting files. The new file space management feature provides strategies for managing space in a file to improve performance in both of these arenas.
With a growing amount of data in HDF5, the need has emerged to access data stored across multiple HDF5 files using standard HDF5 objects, such as groups and datasets, without rewriting or rearranging the data. The new virtual dataset (VDS) feature enables an application to draw on multiple datasets and files to create virtual datasets without moving or rewriting any data.
New options for the storage and filtering of partial edge chunks in a dataset provide a tool for tuning I/O speed and file size in cases where the dataset size may not be a multiple of the chunk size.
In addition to the features described above, several additional new functions, a new struct, and new macros have been introduced or newly versioned in this release.
Changes to the File Format Specification
The file format of the HDF5 library has been changed to support the new features in HDF5-1.10.
See the HDF5 File Format Specification for complete details on the changes. This specification describes how the bytes in an HDF5 file are organized on the storage media where the file is kept. In other words, when a file is written to disk, the file will be written according to the information described in this file. The following sections have been added or changed:
- Another version of the superblock was added.
- Additional B-tree types were added to the version 2 B-trees.
- The global heap block for virtual datasets was added.
- The Data Layout Message was changed: the name was changed, and version 4 of the data layout message was added for the virtual type.
- Additional types of indexes were added for dataset chunks.