Page tree

The license could not be verified: License Certificate has expired!









Copies an HDF5 file to a new file with or without compression and/or chunking

h5repack [OPTIONS] in_file out_file

h5repack -i in_file -o out_file [OPTIONS]


h5repack is a command line tool that applies HDF5 filters to an input file in_file, saving the output in a new output file, out_file.


If encountering poor performance using h5repack with large datasets, please note that the  H5TOOLS_BUFSIZE environment variable can be used to improve performance. This environment variable specifies the hyperslab (selection) buffer size (in bytes) that is used by h5repack. Its default value is 32 MB (32*1024*1024=33554432 bytes), which may be very small for large datasets. The dataset does not have to be chunked to use this environment variable.

For example, if encountering a performance issue when using h5repack with a large 3D dataset with a chunk size of  512*512*512 and a datatype of 32-bit float (4 bytes in size), then setting H5TOOLS_BUFSIZE to the size of (at least) one chunk times the datatype (512*512*512*4=536870912) should improve performance. On Unix the H5TOOLS_BUFSIZE environment variable can be set as follows:

setenv H5TOOLS_BUFSIZE 536870912



Options and Parameters:

-i in_file

    Input HDF5 file


-o out_file

    Output HDF5 file


-h   or  --help

    Print help message.


-v   or  --verbose

    Print verbose output.


-V   or  --version

    Print version number.


-n   or  --native

    Use native HDF5 datatypes when repacking.

    (Default behavior is to use original file datatypes.)

    Note that this is a change in default behavior; prior to Release 1.6.6, h5repack generated files only with native datatypes.


-L   or  --latest

    Use latest version of the HDF5 file format.


-c max_compact_links   or  --compact=max_compact_links

    Set the maximum number of links, max_compact_links, that can be stored in a group header message (compact format).


-d min_indexed_links   or  --indexed=min_indexed_links

    Set the minimum number of links, min_indexed_links, in the indexed format.


    max_compact_links and min_indexed_links are closely related and the first must be equal to or greater than the second. In the general case, however, performance will suffer, possibly dramatically, if they are equal; performance can be improved by tuning the gap between the two values to minimize unnecessary thrashing between the compact storage and indexed storage modes as group size waxes and wanes. The relationship between max_compact_links and min_indexed_links is most important when group sizes are highly dynamic; that relationship is much less important in files with a stable structure. Compact mode is space and performance-efficient when groups have small numbers of members; indexed mode requires slightly more storage space, but provides increasingly better performance as the number of members in each group increases.


-m size   or  --minimum=size

    Apply filter(s) only to objects whose size in bytes is equal to or greater than size.

    size must be an integer greater than one ( 1 ).


    Default:  If no size is specified, a threshold of 1024 bytes is assumed.


-u file   or  --ublock=file

    Specify name of file containing user block data to be added.


-b user_block_size   or  --block=user_block_size

    Set size in bytes of user block to be added.

    user_block_size must be 512 or greater and a power of 2.


    Default:  1024


-M size   or  --metadata_block_size=size

    Metadata block size to be used when h5repack calls H5Pset_meta_block_size.

    size must be a non-negative integer.


-t alignment_threshold   or  --threshold=alignment_threshold

    Set threshold value for H5Pset_alignment call.

    alignment_threshold must be an integer.


-a alignment   or  --alignment=alignment

    Set alignment value for H5Pset_alignment call.

    alignment must be a positive integer.


-s min_size[:header_type]   or  --ssize=min_size[:header_type]

    Set the minimum size of optionally specified types of shared object header messages.


    min_size is the minimum size, in bytes, of a shared object header message. Header messages smaller than the specified size will not be shared.


    header_type specifies the type(s) of header message that this minimum size is to be applied to. Valid values of header_type are any of the following:

      dspace  for dataspace header messages

      dtype   for datatype header messages

      fill    for fill values

      pline   for property list header messages

      attr    for attribute header messages

    If header_type is not specified, min_size will be applied to all header messages.


-f filter   or  --filter=filter

    Filter type


    filter is a string of the following format:

    list_of_objects : name_of_filter[=filter_parameters]


    list_of_objects is a comma separated list of object names meaning apply the filter(s) only to those objects. If no object names are specified, the filter is applied to all objects.


    name_of_filter can be one of the following:

         GZIP, to apply the HDF5 GZIP filter (GZIP compression)

         SZIP, to apply the HDF5 SZIP filter (SZIP compression)

         SHUF, to apply the HDF5 shuffle filter

         FLET, to apply the HDF5 checksum filter

         NBIT, to apply the HDF5 N-bit filter

         SOFF, to apply the HDF5 scale/offset filter

         UD, to apply a user-defined filter

         NONE, to remove any filter(s)


    filter_parameters conveys optional compression information:

         GZIP=deflation_level from 1-9


             pixels_per_block is a even number in the range 2-32.

             coding_method is EC or NN.

         SHUF (no parameter)

         FLET (no parameter)

         NBIT (no parameter)


             scale_factor is an integer.

             scale_type is either IN or DS.


             filter_id is the filter identifier.

             nfilter_params is the number of filter parameters.

             value_1 through value_n are the values of each filter parameter.

                     Number of values must match the value of nfilter_params.

         NONE (no parameter)


-l layout   or  --layout=layout

    Layout type


    layout is a string of the following format:

    list_of_objects : layout_type[=layout_parameters]


    list_of_objects is a comma separated list of object names, meaning that layout information is supplied for those objects. If no object names are specified, the layout is applied to all objects.


    layout_type can be one of the following:

         CHUNK, to apply chunking layout

         COMPA, to apply compact layout

         CONTI, to apply contiguous layout


    layout_parameters is present only in the CHUNK case and specifies the chunk size of each dimension in the following format with no intervening spaces:

         dim_1 × dim_2 × ... dim_n


-e file   or  --file=file

    File containing values to be passed in for the -f (or --filter) and -l (or --layout) options.

    This file contains only the filter and layout flags.


-G fs_pagesize   or  --fs_pagesize=fs_page_size

    File space page size in bytes (see H5Pset_file_space_page_size).

    fs_pagesize is the size (in bytes) greater than or equal to 512 that is used by the library when the file space strategy PAGE is used.


-P fs_persist   or  --fs_persist=fs_persist

    Persisting or not persisting free space (see H5Pset_file_space_strategy).

    fs_persist is 1 for persisting free space and 0 for not persisting free space .


-S fs_strategy   or  --fs_strategy=fs_strategy

    The type of file space management strategy to use for the output file (see H5Pset_file_space_strategy).


    fs_strategy is a string indicating the file space strategy:

         FSM_AGGR: Use free-space managers, aggregators and virtual file driver for file space allocation

         PAGE: Use free-space managers with embedded paged aggregation and virtual file driver for file space allocation

         AGGR: Use aggregators and virtual file driver for file space allocation

         NONE: Use virtual file driver for file space allocation


-T fs_threshold   or  --fs_threshold=fs_threshold

    The free-space section threshold to use for the output file (see H5Pset_file_space_strategy).

    fs_threshold is the minimum size (in bytes) of free-space sections to be tracked by the library's free-space managers.



    Input HDF5 file



    Output HDF5 file

Exit Status:
> 0    An error occurred.

  1. h5repack -f GZIP=1 -v file1 file2
    Applies GZIP compression to all objects in file1 and saves the output in file2. Prints verbose output.
  2. h5repack -f dset1:SZIP=8,NN file1 file2
    Applies SZIP compression only to object dset1.
  3. h5repack -l dset1,dset2:CHUNK=20x10 file1 file2
    Applies chunked layout to objects dset1 and dset2.
  4. h5repack -f UD=307,1,9 file1 file2
    Adds bzip2 filter to all datasets.

Release    Change
1.10.1Options added or modified in this release for file space management and page buffering:
    -G, --fs_page_size
    -P, --fs_persist
    -S, --fs_strategy (modified)
1.10.0Options added in this release for file space management:
    -S, --fs_strategy
    -T, --fs_threshold
1.8.12Added user-defined filter parameter (UD) to -f filter, --filter=filter option for use in read and write operations.
1.8.9-M number, --medata_block_size=number option introduced in this release.
1.8.1Original syntax restored; both the new and the original syntax are now supported.
1.8.0h5repack command line syntax changed in this release.
1.6.2h5repack introduced in this release.

--- Last Modified: December 19, 2018 | 03:22 PM