Copies an HDF5 file to a new file with or without compression and/or chunking
h5repack [OPTIONS] in_file out_file h5repack -i in_file -o out_file [OPTIONS]
h5repack is a command line tool that applies HDF5 filters to an input file in_file, saving the output in a new output file, out_file.
If encountering poor performance using h5repack with large datasets, please note that the H5TOOLS_BUFSIZE environment variable can be used to improve performance. This environment variable specifies the hyperslab (selection) buffer size (in bytes) that is used by h5repack. Its default value is 32 MB (32*1024*1024=33554432 bytes), which may be very small for large datasets. The dataset does not have to be chunked to use this environment variable.
For example, if encountering a performance issue when using h5repack with a large 3D dataset with a chunk size of 512*512*512 and a datatype of 32-bit float (4 bytes in size), then setting H5TOOLS_BUFSIZE to the size of (at least) one chunk times the datatype (512*512*512*4=536870912) should improve performance. On Unix the H5TOOLS_BUFSIZE environment variable can be set as follows:
setenv H5TOOLS_BUFSIZE 536870912
Options and Parameters:
Input HDF5 file
Output HDF5 file
-h or --help
Print help message.
-v or --verbose
Print verbose output.
-V or --version
Print version number.
-n or --native
Use native HDF5 datatypes when repacking.
(Default behavior is to use original file datatypes.)
Note that this is a change in default behavior; prior to Release 1.6.6, h5repack generated files only with native datatypes.
-L or --latest
Use latest version of the HDF5 file format.
-c max_compact_links or --compact=max_compact_links
Set the maximum number of links, max_compact_links, that can be stored in a group header message (compact format).
-d min_indexed_links or --indexed=min_indexed_links
Set the minimum number of links, min_indexed_links, in the indexed format.
max_compact_links and min_indexed_links are closely related and the first must be equal to or greater than the second. In the general case, however, performance will suffer, possibly dramatically, if they are equal; performance can be improved by tuning the gap between the two values to minimize unnecessary thrashing between the compact storage and indexed storage modes as group size waxes and wanes. The relationship between max_compact_links and min_indexed_links is most important when group sizes are highly dynamic; that relationship is much less important in files with a stable structure. Compact mode is space and performance-efficient when groups have small numbers of members; indexed mode requires slightly more storage space, but provides increasingly better performance as the number of members in each group increases.
-m size or --minimum=size
Apply filter(s) only to objects whose size in bytes is equal to or greater than size.
size must be an integer greater than one ( 1 ).
Default: If no size is specified, a threshold of 1024 bytes is assumed.
-u file or --ublock=file
Specify name of file containing user block data to be added.
-b user_block_size or --block=user_block_size
Set size in bytes of user block to be added.
user_block_size must be 512 or greater and a power of 2.
-M size or --metadata_block_size=size
Metadata block size to be used when h5repack calls H5Pset_meta_block_size.
size must be a non-negative integer.
-t alignment_threshold or --threshold=alignment_threshold
Set threshold value for H5Pset_alignment call.
alignment_threshold must be an integer.
-a alignment or --alignment=alignment
Set alignment value for H5Pset_alignment call.
alignment must be a positive integer.
-s min_size[:header_type] or --ssize=min_size[:header_type]
Set the minimum size of optionally specified types of shared object header messages.
min_size is the minimum size, in bytes, of a shared object header message. Header messages smaller than the specified size will not be shared.
header_type specifies the type(s) of header message that this minimum size is to be applied to. Valid values of header_type are any of the following:
dspace for dataspace header messages
dtype for datatype header messages
fill for fill values
pline for property list header messages
attr for attribute header messages
If header_type is not specified, min_size will be applied to all header messages.
-f filter or --filter=filter
filter is a string of the following format:
list_of_objects : name_of_filter[=filter_parameters]
list_of_objects is a comma separated list of object names meaning apply the filter(s) only to those objects. If no object names are specified, the filter is applied to all objects.
name_of_filter can be one of the following:
GZIP, to apply the HDF5 GZIP filter (GZIP compression)
SZIP, to apply the HDF5 SZIP filter (SZIP compression)
SHUF, to apply the HDF5 shuffle filter
FLET, to apply the HDF5 checksum filter
NBIT, to apply the HDF5 N-bit filter
SOFF, to apply the HDF5 scale/offset filter
UD, to apply a user-defined filter
NONE, to remove any filter(s)
filter_parameters conveys optional compression information:
GZIP=deflation_level from 1-9
pixels_per_block is a even number in the range 2-32.
coding_method is EC or NN.
SHUF (no parameter)
FLET (no parameter)
NBIT (no parameter)
scale_factor is an integer.
scale_type is either IN or DS.
filter_id is the filter identifier.
nfilter_params is the number of filter parameters.
value_1 through value_n are the values of each filter parameter.
Number of values must match the value of nfilter_params.
NONE (no parameter)
-l layout or --layout=layout
layout is a string of the following format:
list_of_objects : layout_type[=layout_parameters]
list_of_objects is a comma separated list of object names, meaning that layout information is supplied for those objects. If no object names are specified, the layout is applied to all objects.
layout_type can be one of the following:
CHUNK, to apply chunking layout
COMPA, to apply compact layout
CONTI, to apply contiguous layout
layout_parameters is present only in the CHUNK case and specifies the chunk size of each dimension in the following format with no intervening spaces:
dim_1 × dim_2 × ... dim_n
-e file or --file=file
File containing values to be passed in for the -f (or --filter) and -l (or --layout) options.
This file contains only the filter and layout flags.
-G fs_pagesize or --fs_pagesize=fs_page_size
File space page size in bytes (see H5Pset_file_space_page_size).
fs_pagesize is the size (in bytes) greater than or equal to 512 that is used by the library when the file space strategy PAGE is used.
-P fs_persist or --fs_persist=fs_persist
Persisting or not persisting free space (see H5Pset_file_space_strategy).
fs_persist is 1 for persisting free space and 0 for not persisting free space .
-S fs_strategy or --fs_strategy=fs_strategy
The type of file space management strategy to use for the output file (see H5Pset_file_space_strategy).
fs_strategy is a string indicating the file space strategy:
FSM_AGGR: Use free-space managers, aggregators and virtual file driver for file space allocation
PAGE: Use free-space managers with embedded paged aggregation and virtual file driver for file space allocation
AGGR: Use aggregators and virtual file driver for file space allocation
NONE: Use virtual file driver for file space allocation
-T fs_threshold or --fs_threshold=fs_threshold
The free-space section threshold to use for the output file (see H5Pset_file_space_strategy).
fs_threshold is the minimum size (in bytes) of free-space sections to be tracked by the library's free-space managers.
Input HDF5 file
Output HDF5 file
|> 0||An error occurred.|
h5repack -f GZIP=1 -v file1 file2
Applies GZIP compression to all objects in
file1and saves the output in
file2. Prints verbose output.
h5repack -f dset1:SZIP=8,NN file1 file2
Applies SZIP compression only to object
h5repack -l dset1,dset2:CHUNK=20x10 file1 file2
Applies chunked layout to objects
h5repack -f UD=307,1,9 file1 file2
bzip2filter to all datasets.
|1.10.1||Options added or modified in this release for file space management and page buffering: |
|1.10.0||Options added in this release for file space management: |
|1.8.12||Added user-defined filter parameter (|
|1.8.1||Original syntax restored; both the new and the original syntax are now supported.|