Page tree

The license could not be verified: License Certificate has expired!

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

If you are new to HDF5 please read the Learning the Basics topic first. 

Overview of Parallel HDF5 (PHDF5) Design

There were several requirements that we had for Parallel HDF5 (PHDF5). These were:

  • Parallel HDF5 files had to be compatible with serial HDF5 files and sharable between different serial and parallel platforms.

  • Parallel HDF5 had to be designed to have a single file image to all processes, rather than having one file per process. Having one file per process can cause expensive post processing, and the files are not usable by different processes.

  • A standard parallel I/O interface had to be portable to different platforms.

With these requirements of HDF5 our initial target was to support MPI programming, but not for shared memory programming. We had done some experimentation with thread-safe support for Pthreads and for OpenMP, and decided to use these.

Implementation requirements were to:

  • Not use Threads, since they were not commonly supported in 1998 when we were looking at this.

  • Not have a reserved process, as this might interfere with parallel algorithms.

  • Not spawn any processes, as this is not even commonly supported now.

The following shows the Parallel HDF5 implementation layers:

Parallel Programming with HDF5

This tutorial assumes that you are somewhat familiar with parallel programming with MPI (Message Passing Interface).

If you are not familiar with parallel programming, here is a tutorial that may be of interest:

Some of the terms that you must understand in this tutorial are:

    • MPI Communicator:

      Allows a group of processes to communicate with each other.

      Following are the MPI routines for initializing MPI and the communicator and finalizing a session with MPI:

          C               Fortran          Description
          --              -------          -----------
          MPI_Init        MPI_INIT         Initialize MPI (MPI_COMM_WORLD usually)
      
          MPI_Comm_size   MPI_COMM_SIZE    Define how many processes are contained 
                                           in the communicator
      
          MPI_Comm_rank   MPI_COMM_RANK    Define the process ID number within 
                                           the communicator (from 0 to n-1)
      
          MPI_Finalize    MPI_FINALIZE     Exiting MPI
      

 

  • Collective:   MPI defines this to mean all processes of the communicator must participate in the right order.

Parallel HDF5 opens a parallel file with a communicator. It returns a file handle to be used for future access to the file.

All processes are required to participate in the collective Parallel HDF5 API. Different files can be opened using different communicators.

Examples of what you can do with the Parallel HDF5 collective API:

  • File Operation:            Create, open and close a file
  • Object Creation:          Create, open, and close a dataset
  • Object Structure:         Extend a dataset (increase dimension sizes)
  • Dataset Operations:   Write to or read from a dataset
    (Array data transfer can be collective or independent.)

Once a file is opened by the processes of a communicator:

  • All parts of the file are accessible by all processes.
  • All objects in the file are accessible by all processes.
  • Multiple processes write to the same dataset.
  • Each process writes to a individual dataset.

Please refer to the Supported Configuration Features Summary in the release notes for the current release of HDF5 for an up-to-date list of the platforms that we support Parallel HDF5 on.

Creating and Accessing a File with PHDF5

Creating and Accessing a Dataset with PHDF5

Writing and Reading Hyperslabs:  

The programming model for writing and reading hyperslabs is:

    • Each process defines the memory and file hyperslabs.
    • Each process executes a partial write/read call which is either collective or independent.

The memory and file hyperslabs in the first step are defined with the H5Sselect_hyperslab (C) / h5sselect_hyperslab_f (F90).

The start (or offset), count, stride, and block parameters define the portion of the dataset to write to. By changing the values of these parameters you can write hyperslabs with Parallel HDF5 by contiguous hyperslab, by regularly spaced data in a column/row, by patterns, and by chunks:

by Contiguous Hyperslab  

by Regularly Spaced Data  

by Pattern  

by Chunk

--- Last Modified: August 15, 2017 | 02:18 PM