Page tree


What is HDF5?

HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections.


 

The HDF5 technology suite includes:

  • A versatile data model that can represent very complex data objects and a wide variety of metadata.
  • A completely portable file format with no limit on the number or size of data objects in the collection.
  • A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.
  • A rich set of integrated performance features that allow for access time and storage space optimizations.
  • Tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.

The HDF5 data model, file format, API, library, and tools are open and distributed without charge.

HDF5 Technologies

The HDF5 technology suite is designed to organize, store, discover, access, analyze, share, and preserve diverse, complex data in continuously evolving heterogeneous computing and storage environments.

HDF5 supports all types of data stored digitally, regardless of origin or size. Petabytes of remote sensing data collected by satellites, terabytes of computational results from nuclear testing models, and megabytes of high-resolution MRI brain scans are stored in HDF5 files, together with metadata necessary for efficient data sharing, processing, visualization, and archiving.

The combination of features provided by HDF5 makes it unique:


Unlimited size, extensibility, and portability

  • HDF5 does not limit the size of files or the size or number of objects in a file.
  • The HDF5 format and library are extensible and designed to evolve gracefully to satisfy new demands.
  • HDF5 functionality and data is portable across virtually all computing platforms and is distributed with C, C++, Java, and Fortran90 programming interfaces.

General data model

  • HDF5 has a simple but versatile data model.
  • The HDF5 data model supports complex data relationships and dependencies through its grouping and linking mechanisms.
  • HDF5 accommodates many common types of metadata and arbitrary user-defined metadata.

Unlimited variety of datatypes

  • HDF5 supports a rich set of pre-defined datatypes as well as the creation of an unlimited variety of complex user-defined datatypes.
  • Datatype definitions can be shared among objects in an HDF file, providing a powerful and efficient mechanism for describing data.
  • Datatype definitions include information such as byte order (endian), size, and floating point representation, to fully describe how the data is stored, insuring portability to other platforms.

Flexible, efficient I/O

  • HDF5, through its virtual file layer, offers extremely flexible storage and data transfer capabilities. Standard (Posix), Parallel, and Network I/O file drivers are provided with HDF5.
  • Application developers can write additional file drivers to implement customized data storage or transport capabilities.
  • The parallel I/O driver for HDF5 reduces access times on parallel systems by reading/writing multiple data streams simultaneously.

Flexible data storage

  • HDF5 employs various compression, extensibility, and chunking strategies to improve access, management, and storage efficiency.
  • HDF5 provides for external storage of raw data, allowing raw data to be shared among HDF5 files and/or applications, and often saving disk space.

Data transformation and complex subsetting

  • HDF5 enables datatype and spatial transformation during I/O operations.
  • HDF5 data I/O functions can operate on selected subsets of the data, reducing transferred data volume and improving access speed.