Can you store a binary object (for example, a Word document ) in an HDF5 file
Yes. There are a couple of ways to do this:
- Store your binary file in a dataset with an opaque datatype
- Store your binary file in the user block
Opaque Datatype
You can store your file in a dataset with an opaque datatype, in which the opaque datatype's tag is set to the MIME type of the file, with a prefix of "Content-Type":
Content-Type: <mime-type>
Storing the file as a dataset with an opaque datatype will prevent datatype conversions from occurring when reading or writing. Putting the mime-type in the tag, with the "Content-Type:" prefix, gives users a standard place to look for the mime-type (and mimics the prefix that email headers use for storing the mime-type).
User Block
You can store the file in the user block. The user block is a user definable data block at the beginning of an HDF5 file, which HDF5 ignores. There are utilities (h5jam, h5unjam) for writing to and extracting information from the user block. See this page for how to use these tools:
https://portal.hdfgroup.org/display/HDF5/Command-Line+Tools+for+Editing+HDF5+Files#Command-LineToolsforEditingHDF5Files-ublock
Also see the H5Pset_userblock function for creating a user block from within an application. The user block is a file creation property list (so it has to be done when creating the file).
For example, in C this is how you can set the user block of size 512:
fcpl = H5Pcreate (H5P_FILE_CREATE);
status = H5Pset_userblock(fcpl, 512);
file_id = H5Fcreate(FILENAME, H5F_ACC_TRUNC, fcpl, H5P_DEFAULT);
status = H5Pclose(fcpl);
For details on setting or retrieving the user block from an application, see H5Pset_userblock and H5Pget_userblock.
See these examples:
- h5ublock.c - creates an HDF5 file with a user block
- h5rdub.c - reads the user block in the HDF5 file without the HDF5 library.