Page tree


Does HDF5 support compression with parallel HDF5?

As of HDF5 1.6.3, you can read compressed data but cannot write in parallel.

Why do we not support writing of compressed data in parallel? Compression uses chunking. Since chunks are preallocated in the file before writing, chunks have to be of the same size. However, the size of the compressed chunk is not known in advance.

Chunks are preallocated in the file to avoid the following problem: we allow independent I/O on raw data (with H5Dwrite), but require collective operations to operate on metadata (like the B-tree that tracks the chunks in a chunked dataset or the "free space in the file" metadata (for allocating space when a compressed chunk changes size)). Therefore, in order to allow independent raw data I/O (and simplify the collective raw data I/O), we require the chunks to be preallocated (so we don't have to change the chunk B-tree) and disallow writing to compressed chunked data and variable-length datatypes (so we don't have to allocate/free space in the file) when performing parallel I/O.