Page tree

The license could not be verified: License Certificate has expired!

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

Contents:

What is a Datatype?

A datatype is a collection of datatype properties which provide complete information for data conversion to or from that datatype.

Datatypes in HDF5 can be grouped as follows:

  • Pre-Defined Datatypes:   These are datatypes that are created by HDF5. They are actually opened (and closed) by HDF5, and can have a different value from one HDF5 session to the next.
  • Derived Datatypes:   These are datatypes that are created or derived from the pre-defined datatypes. Although created from pre-defined types, they represent a category unto themselves. An example of a commonly used derived datatype is a string of more than one character.

Pre-Defined Datatypes

The properties of pre-defined datatypes are:

  • Pre-defined datatypes are opened and closed by HDF5.
  • A pre-defined datatype is a handle and is NOT PERSISTENT. Its value can be different from one HDF5 session to the next.
  • Pre-defined datatypes are Read-Only.
  • As mentioned, other datatypes can be derived from pre-defined datatypes.

There are two types of pre-defined datatypes, standard (file) and native:

  • STANDARD

    A standard (or file) datatype can be:

    • Atomic: A datatype which cannot be decomposed into smaller datatype units at the API level.
      The atomic datatypes are:   integer, float, string, date and time, bitfield, reference, opaque

       

    • Composite: An aggregation of one or more datatypes.
      Composite datatypes include:   array, variable length, enumeration, compound datatypes

      Array, variable length, and enumeration datatypes are defined in terms of a single atomic datatype, whereas a compound datatype is a datatype composed of a sequence of datatypes.

    Notes:
    •  Standard pre-defined datatypes are the SAME on all platforms.
    •  They are the datatypes that you see in an HDF5 file.
    •  They are typically used when creating a dataset.
  • NATIVE

    Native pre-defined datatypes are used for memory operations, such as reading and writing. They are NOT THE SAME on different platforms. They are similar to C type names, and are aliased to the appropriate HDF5 standard pre-defined datatype for a given platform.

    For example, when on an Intel based PC, H5T_NATIVE_INT is aliased to the standard pre-defined type, H5T_STD_I32LE. On a MIPS machine, it is aliased to H5T_STD_I32BE.

    Notes:
    •  Native datatypes are NOT THE SAME on all platforms.
    •  Native datatypes simplify memory operations (read/write). The HDF5 library automatically converts as needed.
    •  Native datatypes are NOT in an HDF5 File. The standard pre-defined datatype that a native datatype corresponds to is what you will see in the file. 

The following table shows the native types and the standard pre-defined datatypes they correspond to. (Keep in mind that HDF5 can convert between datatypes, so you can specify a buffer of a larger type for a dataset of a given type. For example, you can read a dataset that has a short datatype into a long integer buffer.)

Fig. 1   Some HDF5 pre-defined native datatypes and corresponding standard (file) type

  C Type  HDF5 Memory Type    HDF5 File Type *
Integer:
  int  H5T_NATIVE_INT  H5T_STD_I32BE or H5T_STD_I32LE
  short  H5T_NATIVE_SHORT  H5T_STD_I16BE or H5T_STD_I16LE
  long  H5T_NATIVE_LONG  H5T_STD_I32BE, H5T_STD_I32LE,
  H5T_STD_I64BE or H5T_STD_I64LE
  long long    H5T_NATIVE_LLONG  H5T_STD_I64BE or H5T_STD_I64LE
  unsigned int  H5T_NATIVE_UINT  H5T_STD_U32BE or H5T_STD_U32LE
  unsigned short  H5T_NATIVE_USHORT  H5T_STD_U16BE or H5T_STD_U16LE
  unsigned long  H5T_NATIVE_ULONG  H5T_STD_U32BE, H5T_STD_U32LE,
  H5T_STD_U64BE or H5T_STD_U64LE
  unsigned long
    long
  H5T_NATIVE_ULLONG  H5T_STD_U64BE or H5T_STD_U64LE
Float:
  float  H5T_NATIVE_FLOAT  H5T_IEEE_F32BE or H5T_IEEE_F32LE  
  double  H5T_NATIVE_DOUBLE  H5T_IEEE_F64BE or H5T_IEEE_F64LE  
  F90 Type   HDF5 Memory Type   HDF5 File Type *
  integer    H5T_NATIVE_INTEGER  H5T_STD_I32(8,16)BE or H5T_STD_I32(8,16)LE
  real  H5T_NATIVE_REAL  H5T_IEEE_F32BE or H5T_IEEE_F32LE 
  double-
   precision
  H5T_NATIVE_DOUBLE  H5T_IEEE_F64BE or H5T_IEEE_F64LE 
* Note that the HDF5 File Types listed are those that are most commonly created.
  The file type created depends on the compiler switches and platforms being
  used. For example, on the Cray an integer is 64-bit, and using H5T_NATIVE_INT (C)
  or H5T_NATIVE_INTEGER (F90) would result in an H5T_STD_I64BE file type.

The following code is an example of when you would use standard pre-defined datatypes vs. native types:

   #include "hdf5.h"

   main() {

      hid_t       file_id, dataset_id, dataspace_id;  
      herr_t      status;
      hsize_t     dims[2]={4,6};
      int         i, j, dset_data[4][6];

      for (i = 0; i < 4; i++)
           for (j = 0; j < 6; j++)
            dset_data[i][j] = i * 6 + j + 1;

      file_id = H5Fcreate ("dtypes.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

      dataspace_id = H5Screate_simple (2, dims, NULL);

      dataset_id = H5Dcreate (file_id, "/dset", H5T_STD_I32BE, dataspace_id, 
                              H5P_DEFAULT);

      status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, 
                         H5P_DEFAULT, dset_data);

      status = H5Dclose (dataset_id);

      status = H5Fclose (file_id);
   }

By using the native types when reading and writing, the code that reads from or writes to a dataset can be the same for different platforms.

Can native types also be used when creating a dataset? Yes. However, just be aware that the resulting datatype in the file will be one of the standard pre-defined types and may be different than expected.

What happens if you do not use the correct native datatype for a standard (file) datatype? Your data may be incorrect or not what you expect.

Derived Datatypes

ANY pre-defined datatype can be used to derive user-defined datatypes.

To create a datatype derived from a pre-defined type:

  • Make a copy of the pre-defined datatype:

    tid = H5Tcopy (H5T_STD_I32BE);

  • Change the datatype.

There are numerous datatype functions that allow a user to alter a pre-defined datatype. See String below for a simple example.

Refer to the Datatype Interface in the HDF5 Reference Manual. Example functions are H5Tset_size and H5Tset_precision.

Specific Datatypes

Compound

Properties of compound datatypes. A compound datatype is similar to a struct in C or a common block in Fortran. It is a collection of one or more atomic types or small arrays of such types. To create and use of a compound datatype you need to refer to various properties of the data compound datatype:

  • It is of class compound.
  • It has a fixed total size, in bytes.
  • It consists of zero or more members (defined in any order) with unique names and which occupy non-overlapping regions within the datum.
  • Each member has its own datatype.
  • Each member is referenced by an index number between zero and N-1, where N is the number of members in the compound datatype.
  • Each member has a name which is unique among its siblings in a compound datatype.
  • Each member has a fixed byte offset, which is the first byte (smallest byte address) of that member in a compound datatype.
  • Each member can be a small array of up to four dimensions.

Properties of members of a compound datatype are defined when the member is added to the compound type and cannot be subsequently modified.

Defining compound datatypes. Compound datatypes must be built out of other datatypes. First, one creates an empty compound datatype and specifies its total size. Then members are added to the compound datatype in any order.

Member names. Each member must have a descriptive name, which is the key used to uniquely identify the member within the compound datatype. A member name in an HDF5 datatype does not necessarily have to be the same as the name of the corresponding member in the C struct in memory, although this is often the case. Nor does one need to define all members of the C struct in the HDF5 compound datatype (or vice versa).

Offsets. Usually a C struct will be defined to hold a data point in memory, and the offsets of the members in memory will be the offsets of the struct members from the beginning of an instance of the struct. The library defines the macro to compute the offset of a member within a struct:
  HOFFSET(s,m)

This macro computes the offset of member m within a struct variable s.

Here is an example in which a compound datatype is created to describe complex numbers whose type is defined by the complex_t struct.

typedef struct {
   double re;   /*real part */
   double im;   /*imaginary part */
} complex_t;

complex_t tmp;  /*used only to compute offsets */
hid_t complex_id = H5Tcreate (H5T_COMPOUND, sizeof tmp);
H5Tinsert (complex_id, "real", HOFFSET(tmp,re),
           H5T_NATIVE_DOUBLE);
H5Tinsert (complex_id, "imaginary", HOFFSET(tmp,im),
           H5T_NATIVE_DOUBLE);

Reference

A dataset region reference points to the dataset selection by storing the relative file address of the dataset header and the global heap offset of the referenced selection. The selection referenced is located by retrieving the coordinates of the areas in the selection from the global heap. This internal mechanism of storing and retrieving dataset selections is transparent to the user. A reference to the dataset selection (region) is constant for the life of the dataset.

Creating and storing references to dataset regions

The following steps are involved in creating and storing references to the dataset regions:

  1. Create a dataset to store the dataset regions (selections).
  2. Create selections in the dataset(s). Dataset(s) should already exist in the file.
  3. Create references to the selections and store them in a buffer.
  4. Write references to the dataset regions in the file.
  5. Close all objects.

 

  • The code,
        dset1=H5Dcreate(fid1,"Dataset1",H5T_STD_REF_DSETREG,sid1,H5P_DEFAULT);
    
    creates a dataset to store references to the dataset(s) regions (selections). Notice that the H5T_STD_REF_DSETREG datatype is used.
  • This program uses hyperslab and point selections. The dataspace handle sid2 is used for the calls to H5Sselect_hyperslab and H5Sselect_elements. The handle was created when dataset Dataset2 was created and it describes the dataset's dataspace. It was not closed when the dataset was closed to decrease the number of function calls used in the example. In a real application program, one should open the dataset and determine its dataspace using the H5Dget_space function.
  • H5Rcreate is used to create a dataset region reference and store it in a buffer. The signature of the function is:
         herr_t H5Rcreate(void *buf, hid_t loc_id, const char *name,
                          H5R_type_t ref_type, hid_t space_id)
    
    • The first argument specifies the buffer to store the reference.
    • The second and third arguments specify the name of the referenced dataset. In the example, the file identifier fid1 and the absolute name of the dataset /Dataset2 were used to identify the dataset. The reference to the region of this dataset is stored in the buffer buf.
    • The fourth argument specifies the type of the reference. Since the example creates references to the dataset regions, the H5R_DATASET_REGION datatype is used.
    • The fifth argument is a dataspace identifier of the referenced dataset.

Reading references to dataset regions

The following steps are involved in reading references to dataset regions and referenced dataset regions (selections).

  1. Open and read the dataset containing references to the dataset regions. The datatype H5T_STD_REF_DSETREG must be used during read operation.

  2. Use H5Rdereference to obtain the dataset identifier from the read dataset region reference.

    OR

    Use H5Rget_region to obtain the dataspace identifier for the dataset containing the selection from the read dataset region reference.

  3. With the dataspace identifier, the H5S interface functions, H5Sget_select_*, can be used to obtain information about the selection.

  4. Close all objects when they are no longer needed.

The dataset with the region references was read by H5Dread with the H5T_STD_REF_DSETREG datatype specified.

The read reference can be used to obtain the dataset identifier with the following call:

    dset2 = H5Rdereference (dset1,H5R_DATASET_REGION,&rbuf[0]);

or to obtain spacial information (dataspace and selection) with the call to H5Rget_region:

    sid2=H5Rget_region(dset1,H5R_DATASET_REGION,&rbuf[0]);

The reference to the dataset region has information for both the dataset itself and its selection. In both functions:

  1. The first parameter is an identifier of the dataset with the region references.
  2. The second parameter specifies the type of reference stored. In this example, a reference to the dataset region is stored.
  3. The third parameter is a buffer containing the reference of the specified type.

This example introduces several H5Sget_select* functions used to obtain information about selections:

FunctionDescription
H5S_GET_SELECT_NPOINTSReturns the number of elements in the hyperslab
H5S_GET_SELECT_HYPER_NBLOCKSReturns the number of blocks in the hyperslab
H5S_GET_SELECT_BLOCKLISTReturns the "lower left" and "upper right" coordinates of the blocks in the hyperslab selection
H5S_GET_SELECT_BOUNDSReturns the coordinates of the "minimal" block containing a hyperslab selection
H5S_GET_SELECT_ELEM_NPOINTSReturns the number of points in the element selection
H5S_GET_SELECT_ELEM_NPOINTSReturns the coordinates of the element selection

 

H5Sget_select_npoints:

      returns the number of elements in the hyperslab


H5Sget_select_hyper_nblocks:

      returns the number of blocks in the hyperslab


H5Sget_select_blocklist:

      returns the "lower left" and "upper right" coordinates of the blocks in the hyperslab selection


H5Sget_select_bounds:

      returns the coordinates of the "minimal" block containing a hyperslab selection


H5Sget_select_elem_npoints:

      returns the number of points in the element selection


H5Sget_select_elem_points:

    • returns the coordinates of the element selection


String

A simple example of creating a derived datatype is using the string datatype, H5T_C_S1 (H5T_FORTRAN_S1) to create strings of more than one character. Strings can be stored as either fixed or variable length, and may have different rules for padding of unused storage:

Fixed Length 5-character String Datatype:

hid_t strtype;                     /* Datatype ID */
herr_t status;

strtype = H5Tcopy (H5T_C_S1);
status = H5Tset_size (strtype, 5); /* create string of length 5 */
 

Variable Length String Datatype:

strtype = H5Tcopy (H5T_C_S1);
status = H5Tset_size (strtype, H5T_VARIABLE);

The ability to derive datatypes from pre-defined types allows users to create any number of datatypes, from simple to very complex.

As the term implies, variable length strings are strings of varying lengths. They are stored internally in a heap, potentially impacting efficiency in the following ways:

  1. Heap storage requires more space than regular raw data storage.
  2. Heap access generally reduces I/O efficiency because it requires individual read or write operations for each data element rather than one read or write per dataset or per data selection.
  3. A variable length dataset consists of pointers to the heaps of data, not the actual data. Chunking and filters, including compression, are not available for heaps.

See Section 6.6.1 Strings in the HDF5 User's Guide, for more information on how fixed and variable length strings are stored.

--- Last Modified: July 10, 2019 | 11:05 AM