Page tree

Will accessing or reading a dataset be slower if your file contains many datasets


Accessing a dataset in a file with many datasets *will* be slower than if accessing it from a file with just that dataset. However, the drop in performance will not be evident until there are closer to 1 million datasets in the file (maybe around 700,000 datasets), where all datasets are stored at one level (in one group).

The issue is that HDF5 uses btrees to store the data. The time to search a btree structure to find a node is logn speed where "n" is the number of datasets.

Performance can be alleviated by not storing as many datasets in a group. For example, if you have 1,000,000 datasets, performance will be better if you store 1,000 datasets in 1,000 separate groups rather than storing them all in one group.

For the best performance the latest file format should be used. For example, create the file as follows:

   fapid = H5Pcreate (H5P_FILE_ACCESS);
   status = H5Pset_libver_bounds (fapid, H5F_LIBVER_LATEST,H5F_LIBVER_LATEST);
   file_id = H5Fcreate(FILENAME, H5F_ACC_TRUNC, H5P_DEFAULT, fapid);
   status = H5Pclose (fapid);