Page tree

The license could not be verified: License Certificate has expired!

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

Contents:

What is a Datatype?

A datatype is a collection of datatype properties which provide complete information for data conversion to or from that datatype.

Datatypes in HDF5 can be grouped as follows:

  • Pre-Defined Datatypes:   These are datatypes that are created by HDF5. They are actually opened (and closed) by HDF5, and can have a different value from one HDF5 session to the next.
  • Derived Datatypes:   These are datatypes that are created or derived from the pre-defined datatypes. Although created from pre-defined types, they represent a category unto themselves. An example of a commonly used derived datatype is a string of more than one character.

Pre-Defined Datatypes

The properties of pre-defined datatypes are:

  • Pre-defined datatypes are opened and closed by HDF5.
  • A pre-defined datatype is a handle and is NOT PERSISTENT. Its value can be different from one HDF5 session to the next.
  • Pre-defined datatypes are Read-Only.
  • As mentioned, other datatypes can be derived from pre-defined datatypes.

There are two types of pre-defined datatypes, standard (file) and native:

  • STANDARD

    A standard (or file) datatype can be:

    • Atomic: A datatype which cannot be decomposed into smaller datatype units at the API level.
      The atomic datatypes are:   integer, float, string, date and time, bitfield, reference, opaque

       

    • Composite: An aggregation of one or more datatypes.
      Composite datatypes include:   array, variable length, enumeration, compound datatypes

      Array, variable length, and enumeration datatypes are defined in terms of a single atomic datatype, whereas a compound datatype is a datatype composed of a sequence of datatypes.

    Notes:
    •  Standard pre-defined datatypes are the SAME on all platforms.
    •  They are the datatypes that you see in an HDF5 file.
    •  They are typically used when creating a dataset.
  • NATIVE

    Native pre-defined datatypes are used for memory operations, such as reading and writing. They are NOT THE SAME on different platforms. They are similar to C type names, and are aliased to the appropriate HDF5 standard pre-defined datatype for a given platform.

    For example, when on an Intel based PC, H5T_NATIVE_INT is aliased to the standard pre-defined type, H5T_STD_I32LE. On a MIPS machine, it is aliased to H5T_STD_I32BE.

    Notes:
    •  Native datatypes are NOT THE SAME on all platforms.
    •  Native datatypes simplify memory operations (read/write). The HDF5 library automatically converts as needed.
    •  Native datatypes are NOT in an HDF5 File. The standard pre-defined datatype that a native datatype corresponds to is what you will see in the file. 

The following table shows the native types and the standard pre-defined datatypes they correspond to. (Keep in mind that HDF5 can convert between datatypes, so you can specify a buffer of a larger type for a dataset of a given type. For example, you can read a dataset that has a short datatype into a long integer buffer.)

Fig. 1   Some HDF5 pre-defined native datatypes and corresponding standard (file) type

  C Type  HDF5 Memory Type    HDF5 File Type *
Integer:
  int  H5T_NATIVE_INT  H5T_STD_I32BE or H5T_STD_I32LE
  short  H5T_NATIVE_SHORT  H5T_STD_I16BE or H5T_STD_I16LE
  long  H5T_NATIVE_LONG  H5T_STD_I32BE, H5T_STD_I32LE,
  H5T_STD_I64BE or H5T_STD_I64LE
  long long    H5T_NATIVE_LLONG  H5T_STD_I64BE or H5T_STD_I64LE
  unsigned int  H5T_NATIVE_UINT  H5T_STD_U32BE or H5T_STD_U32LE
  unsigned short  H5T_NATIVE_USHORT  H5T_STD_U16BE or H5T_STD_U16LE
  unsigned long  H5T_NATIVE_ULONG  H5T_STD_U32BE, H5T_STD_U32LE,
  H5T_STD_U64BE or H5T_STD_U64LE
  unsigned long
    long
  H5T_NATIVE_ULLONG  H5T_STD_U64BE or H5T_STD_U64LE
Float:
  float  H5T_NATIVE_FLOAT  H5T_IEEE_F32BE or H5T_IEEE_F32LE  
  double  H5T_NATIVE_DOUBLE  H5T_IEEE_F64BE or H5T_IEEE_F64LE  
  F90 Type   HDF5 Memory Type   HDF5 File Type *
  integer    H5T_NATIVE_INTEGER  H5T_STD_I32(8,16)BE or H5T_STD_I32(8,16)LE
  real  H5T_NATIVE_REAL  H5T_IEEE_F32BE or H5T_IEEE_F32LE 
  double-
   precision
  H5T_NATIVE_DOUBLE  H5T_IEEE_F64BE or H5T_IEEE_F64LE 
* Note that the HDF5 File Types listed are those that are most commonly created.
  The file type created depends on the compiler switches and platforms being
  used. For example, on the Cray an integer is 64-bit, and using H5T_NATIVE_INT (C)
  or H5T_NATIVE_INTEGER (F90) would result in an H5T_STD_I64BE file type.

The following code is an example of when you would use standard pre-defined datatypes vs. native types:

   #include "hdf5.h"

   main() {

      hid_t       file_id, dataset_id, dataspace_id;  
      herr_t      status;
      hsize_t     dims[2]={4,6};
      int         i, j, dset_data[4][6];

      for (i = 0; i < 4; i++)
           for (j = 0; j < 6; j++)
            dset_data[i][j] = i * 6 + j + 1;

      file_id = H5Fcreate ("dtypes.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

      dataspace_id = H5Screate_simple (2, dims, NULL);

      dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, 
                              H5P_DEFAULT);

      status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, 
                         H5P_DEFAULT, dset_data);

      status = H5Dclose (dataset_id);

      status = H5Fclose (file_id);
   }

By using the native types when reading and writing, the code that reads from or writes to a dataset can be the same for different platforms.

Can native types also be used when creating a dataset? Yes. However, just be aware that the resulting datatype in the file will be one of the standard pre-defined types and may be different than expected.

What happens if you do not use the correct native datatype for a standard (file) datatype? Your data may be incorrect or not what you expect.

Derived Datatypes

ANY pre-defined datatype can be used to derive user-defined datatypes.

To create a datatype derived from a pre-defined type:

  • Make a copy of the pre-defined datatype:

    tid = H5Tcopy (H5T_STD_I32BE);

  • Change the datatype.

There are numerous datatype functions that allow a user to alter a pre-defined datatype. See String below for a simple example.

Refer to the Datatype Interface in the HDF5 Reference Manual. Example functions are H5Tset_size and H5Tset_precision.

Specific Datatypes

Array Datatype vs Array Dataspace

H5T_ARRAY is a datatype, and it should not be confused with the dataspace of a dataset. The dataspace of a dataset can consist of a regular array of elements. For example, the datatype for a dataset could be an atomic datatype like integer, and the dataset could be an N-dimensional appendable array, as specified by the dataspace. See H5S_CREATE and H5S_CREATE_SIMPLE for details. 

Unlimited dimensions and subsetting are not supported when using the H5T_ARRAY datatype.

The H5T_ARRAY datatype was primarily created to address the simple case of a compound datatype when all members of the compound datatype are of the same type and there is no need to subset by compound datatype members. Creation of such a datatype is more efficient and I/O also requires less work, because there is no alignment involved.

Compound

Properties of compound datatypes. A compound datatype is similar to a struct in C or a common block in Fortran. It is a collection of one or more atomic types or small arrays of such types. To create and use of a compound datatype you need to refer to various properties of the data compound datatype:

  • It is of class compound.
  • It has a fixed total size, in bytes.
  • It consists of zero or more members (defined in any order) with unique names and which occupy non-overlapping regions within the datum.
  • Each member has its own datatype.
  • Each member is referenced by an index number between zero and N-1, where N is the number of members in the compound datatype.
  • Each member has a name which is unique among its siblings in a compound datatype.
  • Each member has a fixed byte offset, which is the first byte (smallest byte address) of that member in a compound datatype.
  • Each member can be a small array of up to four dimensions.

Properties of members of a compound datatype are defined when the member is added to the compound type and cannot be subsequently modified.

Defining compound datatypes. Compound datatypes must be built out of other datatypes. First, one creates an empty compound datatype and specifies its total size. Then members are added to the compound datatype in any order.

Member names. Each member must have a descriptive name, which is the key used to uniquely identify the member within the compound datatype. A member name in an HDF5 datatype does not necessarily have to be the same as the name of the corresponding member in the C struct in memory, although this is often the case. Nor does one need to define all members of the C struct in the HDF5 compound datatype (or vice versa).

Offsets. Usually a C struct will be defined to hold a data point in memory, and the offsets of the members in memory will be the offsets of the struct members from the beginning of an instance of the struct. The library defines the macro to compute the offset of a member within a struct:
  HOFFSET(s,m)

This macro computes the offset of member m within a struct variable s.

Here is an example in which a compound datatype is created to describe complex numbers whose type is defined by the complex_t struct.

typedef struct {
   double re;   /*real part */
   double im;   /*imaginary part */
} complex_t;

complex_t tmp;  /*used only to compute offsets */
hid_t complex_id = H5Tcreate (H5T_COMPOUND, sizeof tmp);
H5Tinsert (complex_id, "real", HOFFSET(tmp,re),
           H5T_NATIVE_DOUBLE);
H5Tinsert (complex_id, "imaginary", HOFFSET(tmp,im),
           H5T_NATIVE_DOUBLE);

Reference

There are two types of Reference datatypes in HDF5:

  • Reference to objects
  • Reference to a dataset region

 

Reference to objects

Reference to a dataset region

A dataset region reference points to a dataset selection in another dataset.  A reference to the dataset selection (region) is constant for the life of the dataset.

Creating and storing references to dataset regions

The following steps are involved in creating and storing references to the dataset regions:

  1. Create a dataset to store the dataset regions (selections), by passing in H5T_STD_REF_DSETREG for the datatype when calling H5D_CREATE.
  2. Create selections in the dataset(s) using H5S_SELECT_HYPERSLAB and/or H5S_SELECT_ELEMENTS. The dataset must exist in the file.
  3. Create references to the selections using H5R_CREATE and store them in a buffer.
  4. Write the references to the dataset regions in the file.
  5. Close all objects.
Reading references to dataset regions

The following steps are involved in reading references to dataset regions and referenced dataset regions (selections).

  1. Open and read the dataset containing references to the dataset regions. The datatype H5T_STD_REF_DSETREG must be used during read operation.

  2. Use H5R_DEREFERENCEto obtain the dataset identifier from the read dataset region reference.

    OR

    Use H5R_GET_REGIONto obtain the dataspace identifier for the dataset containing the selection from the read dataset region reference.

  3. With the dataspace identifier, the H5S interface functions, H5S_GET_SELECT_*, can be used to obtain information about the selection.

  4. Close all objects when they are no longer needed.

The dataset with the region references was read by H5D_READ with the H5T_STD_REF_DSETREG datatype specified.

The read reference can be used to obtain the dataset identifier by calling H5R_DEREFERENCE:

    dset2 = H5Rdereference (dset1,H5R_DATASET_REGION,&rbuf[0]);

or by obtaining obtain spacial information (dataspace and selection) with the call to H5Rget_region:

    sid2=H5Rget_region(dset1,H5R_DATASET_REGION,&rbuf[0]);

The reference to the dataset region has information for both the dataset itself and its selection. In both functions:

  1. The first parameter is an identifier of the dataset with the region references.
  2. The second parameter specifies the type of reference stored. In this example, a reference to the dataset region is stored.
  3. The third parameter is a buffer containing the reference of the specified type.

This example introduces several H5Sget_select* functions used to obtain information about selections:

FunctionDescription
H5S_GET_SELECT_NPOINTSReturns the number of elements in the hyperslab
H5S_GET_SELECT_HYPER_NBLOCKSReturns the number of blocks in the hyperslab
H5S_GET_SELECT_BLOCKLISTReturns the "lower left" and "upper right" coordinates of the blocks in the hyperslab selection
H5S_GET_SELECT_BOUNDSReturns the coordinates of the "minimal" block containing a hyperslab selection
H5S_GET_SELECT_ELEM_NPOINTSReturns the number of points in the element selection
H5S_GET_SELECT_ELEM_NPOINTSReturns the coordinates of the element selection


String

A simple example of creating a derived datatype is using the string datatype, H5T_C_S1 (H5T_FORTRAN_S1) to create strings of more than one character. Strings can be stored as either fixed or variable length, and may have different rules for padding of unused storage:

Fixed Length 5-character String Datatype:

hid_t strtype;                     /* Datatype ID */
herr_t status;

strtype = H5Tcopy (H5T_C_S1);
status = H5Tset_size (strtype, 5); /* create string of length 5 */
 

Variable Length String Datatype:

strtype = H5Tcopy (H5T_C_S1);
status = H5Tset_size (strtype, H5T_VARIABLE);

The ability to derive datatypes from pre-defined types allows users to create any number of datatypes, from simple to very complex.

As the term implies, variable length strings are strings of varying lengths. They are stored internally in a heap, potentially impacting efficiency in the following ways:

  1. Heap storage requires more space than regular raw data storage.
  2. Heap access generally reduces I/O efficiency because it requires individual read or write operations for each data element rather than one read or write per dataset or per data selection.
  3. A variable length dataset consists of pointers to the heaps of data, not the actual data. Chunking and filters, including compression, are not available for heaps.

See Section 6.6.1 Strings in the HDF5 User's Guide, for more information on how fixed and variable length strings are stored.

 

Variable Length

--- Last Modified: July 10, 2019 | 12:02 PM