How to reclaim unused space in an HDF5 file
Question: We have a workflow where occasionally a dataset must be removed from an HDF5 file. When this happens, the file does not decrease in size. Is there a call to reclaim the unused space or remove it from within an application?
HDF5-1.10
There are file space management strategies to manage the unused space in a file. These strategies are described here:
https://portal.hdfgroup.org/display/HDF5/File+Space+Management
While a file is open, HDF5 tracks and re-uses the unused space in the file according to the strategy used. If using a strategy that uses the free space manager, you can select to track the free space across file opens with the "persist" flag. Otherwise, when the file is closed, the space is lost.
HDF5-1.8
File space management only occurs between the HDF5 file open and close, and the free space is NOT tracked beyond file closed. In other words, when you delete a dataset, the space used by the dataset becomes free space that can be re-used as long as the file is open. Once the file is closed, the free space is lost and will remain in the file.
Utility to Remove Free Space
The h5repack utility can be used to remove the unused space in a file, by writing the file to a new file. This utility comes with the HDF5 binary distribution.