Please, help us to better serve our user community by answering the following short survey: https://www.hdfgroup.org/website-survey/
HDF5  1.14.4.3
API Reference
 
Loading...
Searching...
No Matches
Dataset Access Properties

Detailed Description

Use dataset access properties to modify the default behavior of the HDF5 library when accessing datasets. The properties include adjusting the size of the chunk cache, providing prefixes for external content and virtual dataset file paths, and controlling flush behavior, etc. These properties are not persisted with datasets, and can be adjusted at runtime before a dataset is created or opened.

Dataset access property list functions (H5P)
Function Purpose
H5Pset_buffer Sets type conversion and background buffers.
H5Pget_buffer Reads buffer settings.
H5Pset_append_flush/H5Pget_append_flush Sets/gets the values of the append property that is set up in the dataset access property list.
H5Pset_chunk_cache/H5Pget_chunk_cache Sets/gets the raw data chunk cache parameters.
H5Pset_efile_prefix/H5Pget_efile_prefix Sets/gets the prefix for external raw data storage files as set in the dataset access property list.
H5Pset_virtual_prefix/H5Pget_virtual_prefix Sets/gets the prefix to be applied to VDS source file paths.
H5Pset_virtual_printf_gap/H5Pget_virtual_printf_gap Sets/gets the maximum number of missing source files and/or datasets with the printf-style names when getting the extent for an unlimited virtual dataset.
H5Pset_virtual_view/H5Pget_virtual_view Sets/gets the view of the virtual dataset (VDS) to include or exclude missing mapped elements.

Functions

herr_t H5Pget_append_flush (hid_t dapl_id, unsigned dims, hsize_t boundary[], H5D_append_cb_t *func, void **udata)
 Retrieves the values of the append property that is set up in the dataset access property list.
 
herr_t H5Pget_chunk_cache (hid_t dapl_id, size_t *rdcc_nslots, size_t *rdcc_nbytes, double *rdcc_w0)
 Retrieves the raw data chunk cache parameters.
 
ssize_t H5Pget_efile_prefix (hid_t dapl_id, char *prefix, size_t size)
 Retrieves the prefix for external raw data storage files as set in the dataset access property list.
 
ssize_t H5Pget_virtual_prefix (hid_t dapl_id, char *prefix, size_t size)
 Retrieves prefix applied to VDS source file paths.
 
herr_t H5Pget_virtual_printf_gap (hid_t dapl_id, hsize_t *gap_size)
 Returns the maximum number of missing source files and/or datasets with the printf-style names when getting the extent for an unlimited virtual dataset.
 
herr_t H5Pget_virtual_view (hid_t dapl_id, H5D_vds_view_t *view)
 Retrieves the view of a virtual dataset accessed with dapl_id.
 
herr_t H5Pset_append_flush (hid_t dapl_id, unsigned ndims, const hsize_t boundary[], H5D_append_cb_t func, void *udata)
 Sets two actions to perform when the size of a dataset's dimension being appended reaches a specified boundary.
 
herr_t H5Pset_chunk_cache (hid_t dapl_id, size_t rdcc_nslots, size_t rdcc_nbytes, double rdcc_w0)
 Sets the raw data chunk cache parameters.
 
herr_t H5Pset_efile_prefix (hid_t dapl_id, const char *prefix)
 Sets the external dataset storage file prefix in the dataset access property list.
 
herr_t H5Pset_virtual_prefix (hid_t dapl_id, const char *prefix)
 Sets prefix to be applied to VDS source file paths.
 
herr_t H5Pset_virtual_printf_gap (hid_t dapl_id, hsize_t gap_size)
 Sets the maximum number of missing source files and/or datasets with the printf-style names when getting the extent of an unlimited virtual dataset.
 
herr_t H5Pset_virtual_view (hid_t dapl_id, H5D_vds_view_t view)
 Sets the view of the virtual dataset (VDS) to include or exclude missing mapped elements.
 

Function Documentation

◆ H5Pget_append_flush()

herr_t H5Pget_append_flush ( hid_t  dapl_id,
unsigned  dims,
hsize_t  boundary[],
H5D_append_cb_t func,
void **  udata 
)

Retrieves the values of the append property that is set up in the dataset access property list.

Parameters
[in]dapl_idDataset access property list identifier
[in]dimsThe number of elements for boundary
[in]boundaryThe dimension sizes used to determine the boundary
[in]funcThe user-defined callback function
[in]udataThe user-defined input data
Returns
Returns a non-negative value if successful; otherwise, returns a negative value.

H5Pget_append_flush() obtains the following information from the dataset access property list, dapl_id.

boundary consists of the sizes set up in the access property list that are used to determine when a dataset dimension size hits the boundary. Only at most dims boundary sizes are retrieved, and dims will not exceed the corresponding value that is set in the property list.

func is the user-defined callback function to invoke when a dataset's appended dimension size reaches a boundary and udata is the user-defined input data for the callback function.

Since
1.10.0

◆ H5Pget_chunk_cache()

herr_t H5Pget_chunk_cache ( hid_t  dapl_id,
size_t *  rdcc_nslots,
size_t *  rdcc_nbytes,
double *  rdcc_w0 
)

Retrieves the raw data chunk cache parameters.

Parameters
[in]dapl_idDataset access property list identifier
[out]rdcc_nslotsNumber of chunk slots in the raw data chunk cache hash table
[out]rdcc_nbytesTotal size of the raw data chunk cache, in bytes
[out]rdcc_w0Preemption policy
Returns
Returns a non-negative value if successful; otherwise, returns a negative value.

H5Pget_chunk_cache() retrieves the number of chunk slots in the raw data chunk cache hash table, the maximum possible number of bytes in the raw data chunk cache, and the preemption policy value.

These values are retrieved from a dataset access property list. If the values have not been set on the property list, then values returned will be the corresponding values from a default file access property list.

Any (or all) pointer arguments may be null pointers, in which case the corresponding data is not returned.

Since
1.8.3

◆ H5Pget_efile_prefix()

ssize_t H5Pget_efile_prefix ( hid_t  dapl_id,
char *  prefix,
size_t  size 
)

Retrieves the prefix for external raw data storage files as set in the dataset access property list.

Parameters
[in]dapl_idDataset access property list identifier
[in,out]prefixDataset external storage prefix in UTF-8 or ASCII (Path and filename must be ASCII on Windows systems.)
[in]sizeSize of prefix buffer in bytes
Returns
Returns the size of prefix and the prefix string will be stored in prefix if successful. Otherwise returns a negative value and the contents of prefix will be undefined.

H5Pget_efile_prefix() retrieves the file system path prefix for locating external files associated with a dataset that uses external storage. This will be the value set with H5Pset_efile_prefix() or the HDF5 library's default.

The value of size is the size in bytes of the prefix, including the NULL terminator. If the size is unknown, a preliminary H5Pget_elink_prefix() call with the pointer prefix set to NULL will return the size of the prefix without the NULL terminator.

The prefix buffer must be allocated by the caller. In a call that retrieves the actual prefix, that buffer must be of the size specified in size.

Note
See H5Pset_efile_prefix() for a more complete description of file location behavior and for notes on the use of the HDF5_EXTFILE_PREFIX environment variable.
Since
1.8.17

◆ H5Pget_virtual_prefix()

ssize_t H5Pget_virtual_prefix ( hid_t  dapl_id,
char *  prefix,
size_t  size 
)

Retrieves prefix applied to VDS source file paths.

Parameters
[in]dapl_idDataset access property list identifier
[out]prefixPrefix applied to VDS source file paths
[in]sizeSize of prefix, including null terminator
Returns
If successful, returns a non-negative value specifying the size in bytes of the prefix without the NULL terminator; otherwise returns a negative value.

H5Pget_virtual_prefix() retrieves the prefix applied to the path of any VDS source files traversed.

When an VDS source file is traversed, the prefix is retrieved from the dataset access property list dapl_id, returned in the user-allocated buffer pointed to by prefix, and prepended to the filename stored in the VDS virtual file, set with H5Pset_virtual().

The size in bytes of the prefix, including the NULL terminator, is specified in size. If size is unknown, a preliminary H5Pget_virtual_prefix() call with the pointer prefix set to NULL will return the size of the prefix without the NULL terminator.

See also
Supporting Functions: H5Pget_layout(), H5Pset_layout(), H5Sget_regular_hyperslab(), H5Sis_regular_hyperslab(), H5Sselect_hyperslab()
VDS Functions: H5Pget_virtual_count(), H5Pget_virtual_dsetname(), H5Pget_virtual_filename(), H5Pget_virtual_prefix(), H5Pget_virtual_printf_gap(), H5Pget_virtual_srcspace(), H5Pget_virtual_view(), H5Pget_virtual_vspace(), H5Pset_virtual(), H5Pset_virtual_prefix(), H5Pset_virtual_printf_gap(), H5Pset_virtual_view()
Since
1.10.2

◆ H5Pget_virtual_printf_gap()

herr_t H5Pget_virtual_printf_gap ( hid_t  dapl_id,
hsize_t gap_size 
)

Returns the maximum number of missing source files and/or datasets with the printf-style names when getting the extent for an unlimited virtual dataset.

Parameters
[in]dapl_idDataset access property list identifier
[out]gap_sizeMaximum number of the files and/or datasets allowed to be missing for determining the extent of an unlimited virtual dataset with printf-style mappings. (Default: 0)
Returns
Returns a non-negative value if successful; otherwise, returns a negative value.

H5Pget_virtual_printf_gap() returns the maximum number of missing printf-style files and/or datasets for determining the extent of an unlimited virtual dataaset, gap_size, using the access property list for the virtual dataset, dapl_id.

The default library value for gap_size is 0 (zero).

Since
1.10.0

◆ H5Pget_virtual_view()

herr_t H5Pget_virtual_view ( hid_t  dapl_id,
H5D_vds_view_t view 
)

Retrieves the view of a virtual dataset accessed with dapl_id.

Parameters
[in]dapl_idDataset access property list identifier
[out]viewThe flag specifying the view of the virtual dataset.
Returns
Returns a non-negative value if successful; otherwise, returns a negative value.

H5Pget_virtual_view() takes the virtual dataset access property list, dapl_id, and retrieves the flag, view, set by the H5Pset_virtual_view() call.

See also
Supporting Functions: H5Pget_layout(), H5Pset_layout(), H5Sget_regular_hyperslab(), H5Sis_regular_hyperslab(), H5Sselect_hyperslab()
VDS Functions: H5Pget_virtual_count(), H5Pget_virtual_dsetname(), H5Pget_virtual_filename(), H5Pget_virtual_prefix(), H5Pget_virtual_printf_gap(), H5Pget_virtual_srcspace(), H5Pget_virtual_view(), H5Pget_virtual_vspace(), H5Pset_virtual(), H5Pset_virtual_prefix(), H5Pset_virtual_printf_gap(), H5Pset_virtual_view()
Since
1.10.0

◆ H5Pset_append_flush()

herr_t H5Pset_append_flush ( hid_t  dapl_id,
unsigned  ndims,
const hsize_t  boundary[],
H5D_append_cb_t  func,
void *  udata 
)

Sets two actions to perform when the size of a dataset's dimension being appended reaches a specified boundary.

Parameters
[in]dapl_idDataset access property list identifier
[in]ndimsThe number of elements for boundary
[in]boundaryThe dimension sizes used to determine the boundary
[in]funcThe user-defined callback function
[in]udataThe user-defined input data
Returns
Returns a non-negative value if successful; otherwise, returns a negative value.

H5Pset_append_flush() sets the following two actions to perform for a dataset associated with the dataset access property list dapl_id:

  • Call the callback function func set in the property list
  • Flush the dataset associated with the dataset access property list

When a user is appending data to a dataset via H5DOappend() and the dataset's newly extended dimension size hits a specified boundary, the library will perform the first action listed above. Upon return from the callback function, the library will then perform the second action listed above and return to the user. If no boundary is hit or set, the two actions above are not invoked.

The specified boundary is indicated by the parameter boundary. It is a 1-dimensional array with ndims elements, which should be the same as the rank of the dataset's dataspace. While appending to a dataset along a particular dimension index via H5Dappend(), the library determines a boundary is reached when the resulting dimension size is divisible by boundary[index]. A zero value for boundary[index] indicates no boundary is set for that dimension index.

The setting of this property will apply only for a chunked dataset with an extendible dataspace. A dataspace is extendible when it is defined with either one of the following:

  • A dataspace with fixed current and maximum dimension sizes
  • A dataspace with at least one unlimited dimension for its maximum dimension size

When creating or opening a chunked dataset, the library will check whether the boundary as specified in the access property list is set up properly. The library will fail the dataset create or open if the following conditions are true:

  • ndims, the number of elements for boundary, is not the same as the rank of the dataset's dataspace.
  • A non-zero boundary value is specified for a non-extendible dimension.

The callback function func must conform to the following prototype:

typedef herr_t (*H5D_append_cb_t)(hid_t dataset_id, hsize_t *cur_dims, void *op_data);
herr_t(* H5D_append_cb_t)(hid_t dataset_id, hsize_t *cur_dims, void *op_data)
Callback for H5Pset_append_flush()
Definition H5Dpublic.h:142
int64_t hid_t
Definition H5Ipublic.h:60
int herr_t
Definition H5public.h:235
uint64_t hsize_t
Definition H5public.h:297

The parameters of the callback function, per the above prototype, are defined as follows:

  • dataset_id is the dataset identifier.
  • cur_dims is the dataset's current dimension sizes when a boundary is hit.
  • user_data is the user-defined input data.
Since
1.10.0

◆ H5Pset_chunk_cache()

herr_t H5Pset_chunk_cache ( hid_t  dapl_id,
size_t  rdcc_nslots,
size_t  rdcc_nbytes,
double  rdcc_w0 
)

Sets the raw data chunk cache parameters.

Parameters
[in]dapl_idDataset access property list identifier
[in]rdcc_nslotsThe number of chunk slots in the raw data chunk cache for this dataset. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521. If the value passed is H5D_CHUNK_CACHE_NSLOTS_DEFAULT, then the property will not be set on dapl_id and the parameter will come from the file access property list used to open the file.
[in]rdcc_nbytesThe total size of the raw data chunk cache for this dataset. In most cases increasing this number will improve performance, as long as you have enough free memory. The default size is 1 MB. If the value passed is H5D_CHUNK_CACHE_NBYTES_DEFAULT, then the property will not be set on dapl_id and the parameter will come from the file access property list.
[in]rdcc_w0The chunk preemption policy for this dataset. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower, depending on how often you re-read or re-write the same data. The default value is 0.75. If the value passed is H5D_CHUNK_CACHE_W0_DEFAULT, then the property will not be set on dapl_id and the parameter will come from the file access property list.
Returns
Returns a non-negative value if successful; otherwise, returns a negative value.

H5Pset_chunk_cache() sets the number of elements, the total number of bytes, and the preemption policy value in the raw data chunk cache on a dataset access property list. After calling this function, the values set in the property list will override the values in the file's file access property list.

The raw data chunk cache inserts chunks into the cache by first computing a hash value using the address of a chunk, then using that hash value as the chunk's index into the table of cached chunks. The size of this hash table, i.e., and the number of possible hash values, is determined by the rdcc_nslots parameter. If a different chunk in the cache has the same hash value, this causes a collision, which reduces efficiency. If inserting the chunk into cache would cause the cache to be too big, then the cache is pruned according to the rdcc_w0 parameter.

Motivation: H5Pset_chunk_cache() is used to adjust the chunk cache parameters on a per-dataset basis, as opposed to a global setting for the file using H5Pset_cache(). The optimum chunk cache parameters may vary widely with different data layout and access patterns, so for optimal performance they must be set individually for each dataset. It may also be beneficial to reduce the size of the chunk cache for datasets whose performance is not important in order to save memory space.

Example Usage: The following code sets the chunk cache to use a hash table with 12421 elements and a maximum size of 16 MB, while using the preemption policy specified for the entire file: H5Pset_chunk_cache(dapl_id, 12421, 16*1024*1024, H5D_CHUNK_CACHE_W0_DEFAULT);

Usage Notes: The chunk cache size is a property for accessing a dataset and is not stored with a dataset or a file. To guarantee the same chunk cache settings each time the dataset is opened, call H5Dopen() with a dataset access property list where the chunk cache size is set by calling H5Pset_chunk_cache() for that property list. The property list can be used for multiple accesses in the same application.

For files where the same chunk cache size will be appropriate for all or most datasets, H5Pset_cache() can be called with a file access property list to set the chunk cache size for accessing all datasets in the file.

Both methods can be used in combination, in which case the chunk cache size set by H5Pset_cache() will apply except for specific datasets where H5Dopen() is called with dataset property list with the chunk cache size set by H5Pset_chunk_cache().

In the absence of any cache settings, H5Dopen() will by default create a 1 MB chunk cache for the opened dataset. If this size happens to be appropriate, no call will be needed to either function to set the chunk cache size.

It is also possible that a change in access pattern for later access to a dataset will change the appropriate chunk cache size.

Since
1.8.3

◆ H5Pset_efile_prefix()

herr_t H5Pset_efile_prefix ( hid_t  dapl_id,
const char *  prefix 
)

Sets the external dataset storage file prefix in the dataset access property list.

Parameters
[in]dapl_idDataset access property list identifier
[in]prefixDataset external storage prefix in UTF-8 or ASCII (Path and filename must be ASCII on Windows systems.)
Returns
Returns a non-negative value if successful; otherwise, returns a negative value.

H5Pset_efile_prefix() sets the prefix used to locate raw data files for a dataset that uses external storage. This prefix can provide either an absolute path or a relative path to the external files.

H5Pset_efile_prefix() is used in conjunction with H5Pset_external() to control the behavior of the HDF5 library when searching for the raw data files associated with a dataset that uses external storage:

  • The default behavior of the library is to search for the dataset's external storage raw data files in the current working directory of the program.
  • If the prefix is set to an absolute path, the target directory will be searched for the dataset's external storage raw data files.
  • If the prefix is set to a relative path, the target directory, relative to the current working directory, will be searched for the dataset's external storage raw data files.
  • If the prefix is set to a relative path that begins with the special token ${ORIGIN}, that directory, relative to the HDF5 file containing the dataset, will be searched for the dataset's external storage raw data files.

The HDF5_EXTFILE_PREFIX environment variable can be used to override the above behavior (the environment variable supersedes the API call). Setting the variable to a path string and calling H5Dcreate() or H5Dopen() is the equivalent of calling H5Pset_efile_prefix() and calling the same create or open function. The environment variable is checked at the time of the create or open action and copied so it can be safely changed after the H5Dcreate() or H5Dopen() call.

Calling H5Pset_efile_prefix() with prefix set to NULL or the empty string returns the search path to the default. The result would be the same as if H5Pset_efile_prefix() had never been called.

Note
If the external file prefix is not an absolute path and the HDF5 file is moved, the external storage files will also need to be moved so they can be accessed at the new location.
As stated above, the use of the HDF5_EXTFILE_PREFIX environment variable overrides any property list setting. H5Pset_efile_prefix() and H5Pget_efile_prefix(), being property functions, set and retrieve only the property list setting; they are unaware of the environment variable.
On Windows, the prefix must be an ASCII string since the Windows standard C library's I/O functions cannot handle UTF-8 file names.
Since
1.8.17

◆ H5Pset_virtual_prefix()

herr_t H5Pset_virtual_prefix ( hid_t  dapl_id,
const char *  prefix 
)

Sets prefix to be applied to VDS source file paths.

Parameters
[in]dapl_idDataset access property list identifier
[in]prefixPrefix to be applied to VDS source file paths
Returns
Returns a non-negative value if successful; otherwise, returns a negative value.

H5Pset_virtual_prefix() sets the prefix to be applied to the path of any VDS source files traversed. The prefix is prepended to the filename stored in the VDS virtual file, set with H5Pset_virtual().

The prefix is specified in the user-allocated buffer prefix and set in the dataset access property list dapl_id. The buffer should not be freed until the property list has been closed.

See also
Supporting Functions: H5Pget_layout(), H5Pset_layout(), H5Sget_regular_hyperslab(), H5Sis_regular_hyperslab(), H5Sselect_hyperslab()
VDS Functions: H5Pget_virtual_count(), H5Pget_virtual_dsetname(), H5Pget_virtual_filename(), H5Pget_virtual_prefix(), H5Pget_virtual_printf_gap(), H5Pget_virtual_srcspace(), H5Pget_virtual_view(), H5Pget_virtual_vspace(), H5Pset_virtual(), H5Pset_virtual_prefix(), H5Pset_virtual_printf_gap(), H5Pset_virtual_view()
Since
1.10.2

◆ H5Pset_virtual_printf_gap()

herr_t H5Pset_virtual_printf_gap ( hid_t  dapl_id,
hsize_t  gap_size 
)

Sets the maximum number of missing source files and/or datasets with the printf-style names when getting the extent of an unlimited virtual dataset.

Parameters
[in]dapl_idDataset access property list identifier
[in]gap_sizeMaximum number of files and/or datasets allowed to be missing for determining the extent of an unlimited virtual dataset with printf-style mappings (Default value: 0)
Returns
Returns a non-negative value if successful; otherwise, returns a negative value.

H5Pset_virtual_printf_gap() sets the access property list for the virtual dataset, dapl_id, to instruct the library to stop looking for the mapped data stored in the files and/or datasets with the printf-style names after not finding gap_size files and/or datasets. The found source files and datasets will determine the extent of the unlimited virtual dataset with the printf-style mappings.

Consider the following examples where the regularly spaced blocks of a virtual dataset are mapped to datasets with the names d-1, d-2, d-3, ..., d-N, ... :

  • If the dataset d-2 is missing and gap_size is set to 0, then the virtual dataset will contain only data found in d-1.
  • If d-2 and d-3 are missing and gap_size is set to 2, then the virtual dataset will contain the data from d-1, d-3, ..., d-N, ... . The blocks that are mapped to d-2 and d-3 will be filled according to the virtual dataset's fill value setting.
See also
Supporting Functions: H5Pget_layout(), H5Pset_layout(), H5Sget_regular_hyperslab(), H5Sis_regular_hyperslab(), H5Sselect_hyperslab()
VDS Functions: H5Pget_virtual_count(), H5Pget_virtual_dsetname(), H5Pget_virtual_filename(), H5Pget_virtual_prefix(), H5Pget_virtual_printf_gap(), H5Pget_virtual_srcspace(), H5Pget_virtual_view(), H5Pget_virtual_vspace(), H5Pset_virtual(), H5Pset_virtual_prefix(), H5Pset_virtual_printf_gap(), H5Pset_virtual_view()
Since
1.10.0

◆ H5Pset_virtual_view()

herr_t H5Pset_virtual_view ( hid_t  dapl_id,
H5D_vds_view_t  view 
)

Sets the view of the virtual dataset (VDS) to include or exclude missing mapped elements.

Parameters
[in]dapl_idDataset access property list identifier
[in]viewFlag specifying the extent of the data to be included in the view.
Returns
Returns a non-negative value if successful; otherwise, returns a negative value.

H5Pset_virtual_view() takes the access property list for the virtual dataset, dapl_id, and the flag, view, and sets the VDS view according to the flag value.

If view is set to H5D_VDS_FIRST_MISSING, the view includes all data before the first missing mapped data. This setting provides a view containing only the continuous data starting with the dataset's first data element. Any break in continuity terminates the view.

If view is set to H5D_VDS_LAST_AVAILABLE, the view includes all available mapped data.

Missing mapped data is filled with the fill value set in the VDS creation property list.

See also
Supporting Functions: H5Pget_layout(), H5Pset_layout(), H5Sget_regular_hyperslab(), H5Sis_regular_hyperslab(), H5Sselect_hyperslab()
VDS Functions: H5Pget_virtual_count(), H5Pget_virtual_dsetname(), H5Pget_virtual_filename(), H5Pget_virtual_prefix(), H5Pget_virtual_printf_gap(), H5Pget_virtual_srcspace(), H5Pget_virtual_view(), H5Pget_virtual_vspace(), H5Pset_virtual(), H5Pset_virtual_prefix(), H5Pset_virtual_printf_gap(), H5Pset_virtual_view()
Since
1.10.0