Page tree

Select a subset from a 2D dataset and write it to a plane of a 3D dataset

HDF5 allows you to read or write to a portion of a dataset by use of hyperslab selection. A hyperslab selection can be a logically contiguous collection of points in a dataspace, or it can be a regular pattern of points or blocks in a dataspace. You can select a hyperslab to write to or read from with the function H5S_SELECT_HYPERSLAB.

Programming Example

Description

This example creates a 5 x 6 integer array in a file called sds.h5 (sdsf.h5 in FORTRAN). It selects a 3 x 4 hyperslab from the dataset as follows (Dimension 0 is offset by 1 and Dimension 1 is offset by 2):

5 x 6 array:

      
  XXXX
  XXXX
      XXXX
      

Then it reads the hyperslab from this file into a 2-dimensional plane (size 7 x 7) of a 3-dimensional array (size 7 x 7 x 3), as follows (with Dimension 0 offset by 3):

 

       
       
       
XXXX         
XXXX   
XXXX   
       

To obtain the example, download:      C    F90

Remarks

  • H5S_SELECT_HYPERSLAB selects a hyperslab region to add to the current selected region for a specified dataspace.

    Be aware that the start (offset), stride, count and block arrays all work together to define a selection. A change to one of these parameters can affect the other parameters.

    The start, stride, count, and block arrays must be the same size as the rank of the dataspace.

  • The example programs introduce the H5D_GET_SPACE call to obtain the dataspace of a dataset.
  • The C examples also introduce H5S_GET_SIMPLE_EXTENT_DIMS to retrieve the dataspace dimension size of a dataset, and H5S_GET_SIMPLE_EXTENT_NDIMS for obtaining the number of dimensions (rank) in a dataspace.
     


Select and write to independent points in a dataset

The H5S_SELECT_ELEMENTS call allows you to select individual points in a dataspace. By use of it, you can read and write to individual points in a dataset.

The example below creates two HDF5 files, copy1.h5 and copy2.h5. In copy1.h5, it creates a 3 x 4 dataset called "Copy1" and writes zeroes to it. In copy2.h5, it creates a 3 x 4 dataset called "Copy2" and writes ones to it. Then it reopens both files and datasets, selects points in copy1.h5 and writes values to them. It copies the selection and writes values to copy2.h5. Lastly, it closes and reopens the files and prints the contents of the datasets.

See the example programs:    C     F90    Python

For details on compiling an HDF5 application click here.

Remarks

  • H5S_SELECT_ELEMENTS selects array elements to be included in the selection for a dataspace.

    The coord array is a two-dimensional array of size NUMP x RANK in C (RANK x NUMP in FORTRAN) where NUMP is the number of selected points and RANK is the rank of the dataset.

    Note that these coordinates are 0-based in C and 1-based in FORTRAN.

    Consider the non-zero elements of the following array:

                0  59   0  53
                0   0   0   0
                0   0   1   0    
    In C, the coord array selecting these points would be as follows:
                0   1
                0   3
                2   2            
    While in FORTRAN, the coord array would be as follows:
                1   1   3
                2   4   3      
      
  • The example code calls H5S_COPY to create an exact copy of a dataspace.

File Contents

Following is the DDL for copy1.h5 and copy2.h5, as viewed with the following commands:
             h5dump copy1.h5
             h5dump copy2.h5

 


C:

Fig. S.1a   copy1.h5 in DDL

   HDF5 "copy1.h5" {
   GROUP "/" {
      DATASET "Copy1" {
         DATATYPE { H5T_STD_I32BE }
         DATASPACE { SIMPLE ( 3, 4 ) / ( 3, 4 ) }
         DATA {
            0, 59, 0, 53,
            0, 0, 0, 0,
            0, 0, 0, 0
         }
      }
   }
   }

Fig. S.1b   copy2.h5 in DDL

   HDF5 "copy2.h5" {
   GROUP "/" {
      DATASET "Copy2" {
         DATATYPE { H5T_STD_I32BE }
         DATASPACE { SIMPLE ( 3, 4 ) / ( 3, 4 ) }
         DATA {
            1, 59, 1, 53,
            1, 1, 1, 1,
            1, 1, 1, 1
         }
      }
   }
   }

FORTRAN:

Fig. S.2a   copy1.h5 in DDL

   HDF5 "copy1.h5" {
   GROUP "/" {
      DATASET "Copy1" {
         DATATYPE { H5T_STD_I32BE }
         DATASPACE { SIMPLE ( 4, 3 ) / ( 4, 3 ) }
         DATA {
            0, 0, 0,
            53, 0, 0,
            0, 0, 0,
            59, 0, 0
         }
      }
   }
   }

Fig. S.2b   copy2.h5 in DDL

   HDF5 "copy2.h5" {
   GROUP "/" {
      DATASET "Copy2" {
         DATATYPE { H5T_STD_I32BE }
         DATASPACE { SIMPLE ( 4, 3 ) / ( 4, 3 ) }
         DATA {
            1, 1, 1,
            53, 1, 1,
            1, 1, 1,
            59, 1, 1
         }
      }
   }
   }

 

Creating a dataset with a compound datatype

A compound datatype is similar to a struct in C or a common block in FORTRAN. It is a collection of one or more datatypes and can include compound datatypes. To create and use a compound datatype you need to be familiar with various properties of the compound datatype:

  • It is of class compound.
  • It has a fixed total size, in bytes.
  • It consists of one or more members (defined in any order) with unique names and occupying non-overlapping regions within the datum.
  • Each member has its own datatype.
  • Each member is referenced by an index number between zero and N-1, where N is the number of members in the compound datatype.
  • Each member has a name which is unique among its siblings in a compound datatype.
  • Each member has a fixed byte offset, which locates the first byte (smallest byte address) of that member in the compound datatype.
  • Each member can be a small array of up to four dimensions.

Properties of members of a compound datatype are defined when the member is added to the compound datatype and cannot be subsequently modified.

Compound datatypes must be built out of other datatypes. First, one creates an empty compound datatype and specifies its total size. Then members are added to the compound datatype in any order.

High Level APIs

The High Level HDF5 Table APIs (H5TB) include functions to easily create tables in HDF5, using a compound datatype. Please be sure to review them, in addition to this tutorial.

Programming Example

Description

This example shows how to create a dataset with a compound datatype, write data to it, and then read data back:   C    F90

Remarks

  • H5T_CREATE creates a new datatype of the specified class with the specified number of bytes. To create a compound datatype, H5T_COMPOUND is specified for the class.
  • H5T_INSERT adds a member to the compound datatype specified by the datatype identifier.

    The library defines the HOFFSET macro that can be used to compute the offset of a member within a struct:

      HOFFSET ( s, m ) 
    
    This macro computes the offset of member m within a struct variable s.
  • H5T_CLOSE releases a datatype.

 

 


References to Objects

In HDF5, objects (i.e. groups, datasets, and named datatypes) are usually accessed by name. This access method was discussed in previous sections. There is another way to access stored objects - by reference.

An object reference is based on the relative file address of the object header in the file and is constant for the life of the object. Once a reference to an object is created and stored in a dataset in the file, it can be used to dereference the object it points to. References are handy for creating a file index or for grouping related objects by storing references to them in one dataset.

Creating and Storing References to Objects

The following steps are involved in creating and storing file references to objects:

  • Create the objects or open them if they already exist in the file.

  • Create a dataset to store references to the objects.

  • Create and store references to the objects in a buffer.

  • Write the buffer containing the references to the dataset.

Reading References and Accessing Objects Using References

The following steps are involved in reading references to objects and accessing objects using references:

  • Open the dataset with the references and read them. The H5T_STD_REF_OBJ datatype must be used to describe the memory datatype.
  • Use the read reference to obtain the identifier of the object the reference points to.

  • Open the dereferenced object and perform the desired operations.

  • Close all objects when the task is complete.

Programming Example

Description

The examples below create objects (group, dataset, ...), create references to those objects (in an object reference dataset), and then close the objects and object reference dataset. Then they re-open the object references dataset, and re-open the objects by dereferencing the references in the object reference dataset:

C    F90

Remarks

  • An Object Reference dataset can be created by calling H5D_CREATE with an object reference datatype (H5T_STD_REF_OBJ) for the datatype identifier parameter.

  • The H5R_CREATE call creates the reference.

  • When calling H5D_WRITE to write to the dataset of references, the H5T_SDT_REF_OBJ datatype is used to describe the dataset's memory datatype.

  • H5D_READ reads the dataset containing the references to the objects. The H5T_STD_REF_OBJ memory datatype is used to read the references to memory.

  • The H5R_DEREFERENCE call opens the object and returns the object's identifier.
      


 

References to Dataset Regions

Previously you learned about creating, reading, and writing dataset selections. Here you will learn how to store dataset selections in a file, and how to read them back using references to dataset regions.

A dataset region reference points to the dataset selection by storing the relative file address of the dataset header and the global heap offset of the referenced selection. The selection referenced is located by retrieving the coordinates of the areas in the selection from the global heap. This internal mechanism of storing and retrieving dataset selections is transparent to the user. A reference to a dataset selection (a region) is constant for the life of the dataset.

Creating and Storing References to Dataset Regions

The following steps are involved in creating and storing references to dataset regions:

  1. Create a dataset in which to store the dataset regions (the selections).

     

  2. Create selections in the dataset(s). The dataset(s) should already exist in the file.

     

  3. Create references to the selections and store them in a buffer.

     

  4. Write the dataset region references to the file.

     

  5. Close all objects.

Reading References to Dataset Regions

The following steps are involved in reading references to dataset regions and referenced dataset regions (selections).

  1. Open and read the dataset containing references to the dataset regions. The datatype H5T_STD_REF_DSETREG must be used during the read operation.

     

  2. Use H5Rdereference / h5rdeference_f to obtain the dataset identifier from the read dataset region reference. OR Use H5Rget_region / h5rget_region_f to obtain the dataspace identifier for the dataset containing the selection from the read dataset region reference.

     

  3. Obtain information about the selection or read selected data from the dataset.

     

  4. Close all objects when they are no longer needed.

Programming Example

Description

The examples below create a dataset in a file, and write references to regions in the dataset to a dataset of region references. (The regions in the dataset are selected using H5S_SELECT_HYPERSLAB and H5S_SELECT_ELEMENTS.)

Then it reopens the file, dereferences the references and outputs the referenced regions to the screen.

Examples:    C    F90

Remarks

  • A dataset region reference dataset can be created by calling H5D_CREATE with a a datatype of H5T_STD_REF_DSETREG.

  • The dataspace selections are stored as references in the dataset of region references, using the H5R_CREATE call, and passing in hdset_ref_ref_t for the reference type.

  • The dataset with the region references is read by calling H5D_READ with a datatype of H5T_STD_REF_DSETREG.

  • The H5R_DEREFERENCE opens the dataset, using its reference, and returns the dataset identifier. H5R_DATASET_REGION is passed in as the reference type.

  • The H5R_GET_REGION call obtains the spacial information ( dataspace and selection ) for the region reference.
     

 Mounting a file

HDF5 allows you to combine two or more HDF5 files in memory in a manner similar to mounting files in UNIX. The group structure and metadata from one file appear as though they exist in another file. The following steps are involved:

  1. Open the files.
  2. Choose the mount point in the first file (the parent file). The mount point in HDF5 is a group, which CANNOT be the root group.
  3. Use the HDF5 routine H5Fmount / h5fmount_f to mount the second file (the child file) in the first file.
  4. Work with the objects in the second file as if they were members of the mount point group in the first file. The previous contents of the mount point group are temporarily hidden.
  5. Unmount the second file using H5Funmount / h5funmount_f when the work is done.

Programming Example

Description

In the following example, we create one file containing a group and another file containing a dataset. Mounting is used to access the dataset from the second file as a member of a group in the first file. The following figures illustrate this concept.

             FILE1                                   FILE2
  
      --------------------                   --------------------
      !                  !                   !                  !
      !      /           !                   !       /          !
      !       |          !                   !        |         !
      !       |          !                   !        |         !
      !       V          !                   !        V         !
      !     --------     !                   !     ----------   !
      !     ! Group !    !                   !     ! Dataset!   !
      !     ---------    !                   !     ----------   !
      !------------------!                   !------------------! 

After mounting FILE2 under the group in FILE1, the parent file has the following structure:

 
                                FILE1                                 
  
                         --------------------                   
                         !                  !                  
                         !      /           !               
                         !       |          !            
                         !       |          !         
                         !       V          !    
                         !     --------     !              
                         !     ! Group !    !            
                         !     ---------    !           
                         !         |        !
                         !         |        !
                         !         V        !
                         !    -----------   !
                         !    ! Dataset !   !
                         !    !----------   !
                         !                  !
                         !------------------!                    

[ C program ] - h5_mount.c
[ FORTRAN program ] - mountexample.f90

For details on compiling an HDF5 application: [ Compiling HDF5 Applications ]

Remarks

  • The first part of the program creates a group in one file and creates and writes a dataset to another file.
  • Both files are reopened and the second file is mounted in the first using H5F_MOUNT. If no objects will be modified, the files can be opened with H5F_ACC_RDONLY (H5F_ACC_RDONLY_F in FORTRAN). If the data is to be modified, the files should be opened with H5F_ACC_RDWR (H5F_ACC_RDWR_F in FORTRAN).
  • In this example, we only read data from the dataset D. One can also modify data. If the dataset is modified while the file is mounted, it is modified in the original file after the file is unmounted.
  • The file is unmounted with H5F_UNMOUNT.
  • Note that H5F_UNMOUNT does not close files. Files are closed with the respective calls to the H5F_CLOSE function.
  • Closing the parent file automatically unmounts the child file.
  • The h5dump utility cannot display files in memory. Therefore, no output of FILE1 after FILE2 was mounted is provided.

What is a File Driver ?

In HDF5, a file driver is a mapping between the HDF5 format address space and storage. By default, HDF5 simply maps the format address space directly onto a single file.

However, users may want the ability to map the format address space onto different types of storage with various types of maps. With HDF5 we provide a small set of pre-defined file drivers, and we also provide the Virtual File Layer API to enable users to implement their own mappings.

Detailed information on file drivers can be found under VFL Technical Notes in the Documentation.

File Drivers Defined in HDF5

Following are the file drivers that HDF5 provides.

    • H5FD_SEC2:   This is the default driver which uses Posix file-system functions like read and write to perform I/O to a single file.
    • H5FD_STDIO:   This driver uses functions from 'stdio.h' to perform buffered I/O to a single file.
    • H5FD_CORE:   This driver performs I/O directly to memory and can be used to create small temporary files that never exist on permanent storage.
    • H5FD_MPIIO:   This driver is used with Parallel HDF5, and is only pre-defined if the library is compiled with parallel I/O support. Refer to the Parallel HDF5 Tutorial for more information on using Parallel HDF5.
    • H5FD_MULTI:   This driver enables different types of HDF5 data and metadata to be written to separate files. The H5FD_SPLIT driver is an example of what the H5FD_MULTI driver can do.
    • H5FD_FAMILY:   This driver partitions a large format address space into smaller chunks (separate storage of a user's choice).
    • H5FD_SPLIT:   This driver splits the meta data and raw data into separate storage of a user's choice.

Programming Model for Using a Pre-Defined File Driver

  • Create a copy or instance of the File Access property list:
       fapl = H5Pcreate (H5P_FILE_ACCESS);
    

     

  • Initialize the file driver. Each pre-defined file driver has it's own initialization function, whose name is H5Pset_fapl_ followed by the driver name and which takes a file access property list as the first argument, followed by additional driver-dependent arguments. For example:
      size_t member_size = 100*1024*1024;  /* 100 MB */
      status = H5Pset_fapl_family (fapl, member_size, H5P_DEFAULT);
    
    An alternative to using the driver initialization function is to set the driver directly using H5Pset_driver, which is not covered here.

     

  • Call H5Fcreate, passing in the identifier of the property list just modified.
       file_id = H5Fcreate (HDF5FILE, H5F_ACC_TRUNC, H5P_DEFAULT, fapl);
    
  • Close the File Access Property List:
       status = H5Pclose (fapl);
    
  • Perform I/O on the file, if need be. To do so, a Data Access/Transfer property must be copied, modified, and passed in to H5Dread or H5Dwrite.

    For example, the following sets the MPI-IO driver to use independent access for I/O operations:

      dxpl = H5Pcreate (H5P_DATA_XFER);
      status = H5Pset_dxpl_mpio (dxpl, H5FD_MPIO_INDEPENDENT);
      status = H5Dread (dataset_id, type, mspace, fspace, buffer, dxpl);
    

User Designed File Drivers

These are out of the scope of this tutorial. Refer to the Technical Notes documentation on the Virtual File Layer.

How Does a General Application Open an HDF5 File ?

A general application does not know what drivers were used to create a file. It would have to try different file drivers until it succeeds. An example of a general application is the h5dump tool that we provide with HDF5.

 

 

--- Last Modified: December 15, 2017 | 11:10 AM