Page tree

The license could not be verified: License Certificate has expired!

Creating and manipulating datatypes which describe elements of a dataset

  • Predefined Datatypes
  • H5T_ARRAY_CREATECreates an array datatype object
  • H5T_CLOSEReleases a datatype
  • H5T_COMMITCommits a transient datatype, linking it into the file and creating a new named datatype
  • H5T_COMMIT_ANONCommits a transient datatype to a file, creating a new named datatype, but does not link it into the file structure
  • H5T_COMMITTEDDetermines whether a datatype is a committed type or a transient type
  • H5T_COMPILER_CONVCheck whether the library’s default conversion is hard conversion
  • H5T_CONVERTConverts data from one specified datatype to another
  • H5T_COPYCopies an existing datatype
  • H5T_CREATECreates a new datatype
  • H5T_DECODEDecodes a binary object description of datatype and return a new object handle
  • H5T_DETECT_CLASSDetermines whether a datatype contains any datatypes of the given datatype class
  • H5T_ENCODEEncodes a datatype object description into a binary buffer
  • H5T_ENUM_CREATECreates a new enumeration datatype
  • H5T_ENUM_INSERTInserts a new enumeration datatype member
  • H5T_ENUM_NAMEOFReturns the symbol name corresponding to a specified member of an enumeration datatype
  • H5T_ENUM_VALUEOFReturns the value corresponding to a specified member of an enumeration datatype
  • H5T_EQUALDetermines whether two datatype identifiers refer to the same datatype
  • H5T_FINDFinds a conversion function
  • H5T_FLUSHFlushes all buffers associated with a committed datatype to disk
  • H5T_GET_ARRAY_DIMSRetrieves sizes of array dimensions
  • H5T_GET_ARRAY_NDIMSReturns the rank of an array datatype
  • H5T_GET_CLASSReturns the datatype class identifier
  • H5T_GET_CREATE_PLISTReturns a copy of a datatype creation property list
  • H5T_GET_CSETRetrieves the character set type of a string datatype
  • H5T_GET_EBIASRetrieves the exponent bias of a floating-point type
  • H5T_GET_FIELDSRetrieves floating point datatype bit field information
  • H5T_GET_INPADRetrieves the internal padding type for unused bits in floating-point datatypes
  • H5T_GET_MEMBER_CLASSReturns datatype class of compound datatype member
  • H5T_GET_MEMBER_INDEXRetrieves the index of a compound or enumeration datatype member
  • H5T_GET_MEMBER_NAMERetrieves the name of a compound or enumeration datatype member
  • H5T_GET_MEMBER_OFFSETRetrieves the offset of a field of a compound datatype
  • H5T_GET_MEMBER_TYPEReturns the datatype of the specified member
  • H5T_GET_MEMBER_VALUEReturns the value of an enumeration datatype member
  • H5T_GET_NATIVE_TYPEReturns the native datatype of a specified datatype
  • H5T_GET_NMEMBERSRetrieves the number of elements in a compound or enumeration datatype
  • H5T_GET_NORMRetrieves mantissa normalization of a floating-point datatype
  • H5T_GET_OFFSETRetrieves the bit offset of the first significant bit
  • H5T_GET_ORDERReturns the byte order of an atomic datatype
  • H5T_GET_PADRetrieves the padding type of the least and most-significant bit padding
  • H5T_GET_PRECISIONReturns the precision of an atomic datatype
  • H5T_GET_SIGNRetrieves the sign type for an integer type
  • H5T_GET_SIZEReturns the size of a datatype
  • H5T_GET_STRPADRetrieves the type of padding used for a string datatype
  • H5T_GET_SUPERReturns the base datatype from which a datatype is derived
  • H5T_GET_TAGGets the tag associated with an opaque datatype
  • H5T_INSERTAdds a new member to a compound datatype
  • H5T_IS_VARIABLE_STRDetermines whether datatype is a variable-length string
  • H5T_LOCKLocks a datatype
  • H5T_OPENOpens a committed (named) datatype
  • H5T_PACKRecursively removes padding from within a compound datatype
  • H5T_REFRESHRefreshes all buffers associated with a committed datatype
  • H5T_REGISTERRegisters a conversion function
  • H5T_SET_CSETSets character set to be used in a string or character datatype
  • H5T_SET_EBIASSets the exponent bias of a floating-point type
  • H5T_SET_FIELDSSets locations and sizes of floating point bit fields
  • H5T_SET_INPADFills unused internal floating point bits
  • H5T_SET_NORMSets the mantissa normalization of a floating-point datatype
  • H5T_SET_OFFSETSets the bit offset of the first significant bit
  • H5T_SET_ORDERSets the byte order of a datatype
  • H5T_SET_PADSets the least and most-significant bits padding types
  • H5T_SET_PRECISIONSets the precision of an atomic datatype.
  • H5T_SET_SIZESets the total size for a datatype.
  • H5T_SET_SIGNSets the sign property for an integer type.
  • H5T_SET_STRPADDefines the type of padding used for character strings
  • H5T_SET_TAGTags an opaque datatype.
  • H5T_UNREGISTERRemoves a conversion function.
  • H5T_VLEN_CREATECreates a new variable-length array datatype.

The Datatype interface, H5T, provides a mechanism to describe the storage format of individual data points of a data set and is hopefully designed in such a way as to allow new features to be easily added without disrupting applications that use the data type interface. A dataset (the H5D interface) is composed of a collection or raw data points of homogeneous type organized according to the data space (the H5S interface).

A datatype is a collection of datatype properties, all of which can be stored on disk, and which when taken as a whole, provide complete information for data conversion to or from that datatype. The interface provides functions to set and query properties of a datatype.

A data point is an instance of a datatype, which is an instance of a type class. We have defined a set of type classes and properties which can be extended at a later time. The atomic type classes are those which describe types which cannot be decomposed at the datatype interface level; all other classes are compound.

See HDF5 Datatypes in the HDF5 User’s Guide for more information.

Creating variable-length string datatypes 

As the term implies, variable-length strings are strings of varying lengths; they can be arbitrarily long, anywhere from 1 character to thousands of characters.

HDF5 provides the ability to create a variable-length string datatype. Like all string datatypes, this type is based on the atomic string datatype: H5T_C_S1 in C or H5T_FORTRAN_S1 in Fortran. While these datatypes default to one character in size, they can be resized to specific fixed lengths or to variable length.

Variable-length strings will transparently accommodate ASCII strings or UTF-8 strings. This characteristic is set with H5Tset_cset in the process of creating the datatype.

The following HDF5 calls create a C-style variable-length string datatype, vls_type_c_id:

    vls_type_c_id = H5Tcopy(H5T_C_S1)
    status        = H5Tset_size(vls_type_c_id, H5T_VARIABLE) 

In a C environment, variable-length strings will always be NULL-terminated, so the buffer to hold such a string must be one byte larger than the string itself to accommodate the NULL terminator.

In Fortran, strings are normally of fixed length. Variable-length strings come into play only when data is shared with a C application that uses them. For such situations, the datatype class H5T_STRING is predefined by the HDF5 Library to accommodate variable-length strings. The first HDF5 call below creates a Fortran string, vls_type_f_id, that will handle variable-length string data. The second call sets the string padding value to space padding:

    h5tcopy_f(H5T_STRING, vls_type_f_id, hdferr)
    h5tset_strpad_f(vls_type_f_id, H5T_STR_SPACEPAD_F, hdferr) 

While Fortran-style strings are generally space-padded, they may be NULL-terminated in cases where the data is also used in a C environment.

Note:   Under the covers, variable-length strings are stored in a heap, potentially impacting efficiency in the following ways:

  • Heap storage requires more space than regular raw data storage.
  • Heap access generally reduces I/O efficiency because it requires individual read or write operations for each data element rather than one read or write per dataset or per data selection.
  • Chunking and filters, including compression, are not available for heaps.

--- Last Modified: May 10, 2019 | 02:12 PM