Page tree

How can I read/write a dataset greater than 2GB?

If you use the default file access property list (serial) for HDF5, you can read or write a dataset greater than 2GB with one call.

As of HDF5-1.10.2 MPI-IO transfers larger than 2GB are also supported, as described in the release notes:

      Previous releases of PHDF5 would fail when attempting to
      read or write greater than 2GB of data in a single IO operation.
      This issue stems principally from an MPI API whose definitions
      utilize 32 bit integers to describe the number of data elements
      and datatype that MPI should use to effect a data transfer.
      Historically, HDF5 has invoked MPI-IO with the number of
      elements in a contiguous buffer represented as the length
      of that buffer in bytes.

      Resolving the issue and thus enabling larger MPI-IO transfers
      is accomplished first, by detecting when a user IO request would
      exceed the 2GB limit as described above.  Once a transfer request
      is identified as requiring special handling, PHDF5 now creates a
      derived datatype consisting of a vector of fixed sized blocks
      which is in turn wrapped within a single MPI_Type_struct to
      contain the vector and any remaining data.   The newly created
      datatype is then used in place of MPI_BYTE and can be used to
      fulfill the original user request without encountering API


WIth releases prior to HDF5-1.10.2, MPI-IO transfers larger than 2GB were not supported.

There were ways in HDF5 to get around this limitation in the standard by concatenating several derived datatypes, in order to reduce the count to a lower number than what a 32-bit integer can hold. However, this also broke ROMIO (the MPI-IO implementation used by almost all MPI libraries). This is a known limitation of ROMIO, where the most I/O ROMIO can do in a single operation is 2 GB. That is not the same problem as the 'count' parameter being 32 bytes, but rather a limit in ROMIO itself. So unless a fix is implemented in the ROMIO library, the work around the MPI standard (mentioned above) will not work.

The previous solution was to do multiple read/writes as necessary so that the total number of data read/written per call is less than 2 GB. We have a Parallel HDF5 Tutorial here:

Introduction to Parallel HDF5

See the hyperslab selection examples in the tutorial for how to select a subset of a dataset:

Writing and Reading Hyperslabs