Page tree

 

JAVA

FORTRAN

C++

C

 

Link

h5import

Imports data into an existing or new HDF5 file

Syntax:
h5import infile in_options [infile in_options ...] -o outfile 
h5import infile in_options [infile in_options ...] -outfile outfile 
h5import -h 
h5import -help

Description:
h5import converts data from one or more ASCII or binary files, infile, into the same number of HDF5 datasets in the existing or new HDF5 file, outfile. Data conversion is performed in accordance with the user-specified type and storage properties specified in in_options.

The primary objective of h5import is to import floating point or integer data. The utility's design allows for future versions that accept ASCII text files and store the contents as a compact array of one-dimensional strings, but that capability is not implemented in HDF5 Release 1.6.

Input data and options:
Input data can be provided in one of the following forms:

  • As an ASCII, or plain-text, file containing either floating point or integer data
  • As a binary file containing either 32-bit or 64-bit native floating point data
  • As a binary file containing native integer data, signed or unsigned and 8-bit, 16-bit, 32-bit, or 64-bit.
  • As an ASCII, or plain-text, file containing text data. (This feature is not implemented in HDF5 Release 1.6.)
Each input file, infile, contains a single n-dimensional array of values of one of the above types expressed in the order of fastest-changing dimensions first.

Floating point data in an ASCII input file may be expressed either in the fixed-point form (e.g., 323.56) or in scientific notation (e.g., 3.23E+02) in an ASCII input file.

Each input file can be associated with options specifying the datatype and storage properties. These options can be specified either as command line arguments or in a configuration file. Note that exactly one of these approaches must be used with a single input file.

Command line arguments, best used with simple input files, can be used to specify the class, size, dimensions of the input data and a path identifying the output dataset.

The recommended means of specifying input data options is in a configuration file; this is also the only means of specifying advanced storage features. See further discussion in "The configuration file" below.

The only required option for input data is dimension sizes; defaults are available for all others.

h5import will accept up to 30 input files in a single call. Other considerations, such as the maximum length of a command line, may impose a more stringent limitation.

Output data and options:
The name of the output file is specified following the -o or -output option in outfile. The data from each input file is stored as a separate dataset in this output file. outfile may be an existing file. If it does not yet exist, h5import will create it.

Output dataset information and storage properties can be specified only by means of a configuration file.

Dataset pathIf the groups in the path leading to the dataset do not exist, h5import will create them.
If no group is specified, the dataset will be created as a member of the root group.
If no dataset name is specified, the default name is dataset0 for the first input dataset,dataset1 for the second input dataset, dataset2 for the third input dataset, etc.
h5import does not overwrite a pre-existing dataset of the specified or default name. When an existing dataset of a conflicting name is encountered, h5import quits with an error; the current input file and any subsequent input files are not processed.
Output typeDatatype parameters for output data
    Output data classSigned or unsigned integer or floating point
    Output data size8-, 16-, 32-, or 64-bit integer
32- or 64-bit floating point
    Output architectureIEEE
STD
NATIVE (Default)
Other architectures are included in the h5import design but are not implemented in this release.
    Output byte orderLittle- or big-endian.
Relevant only if output architecture is IEEE, UNIX, or STD; fixed for other architectures.
Dataset layout and storage  
        properties
Denote how raw data is to be organized on the disk. If none of the following are specified, the default configuration is contiguous layout and with no compression.
    LayoutContiguous (Default)
Chunked
    External storageAllows raw data to be stored in a non-HDF5 file or in an external HDF5 file.
Requires contiguous layout.
    CompressedSets the type of compression and the level to which the dataset must be compressed.
Requires chunked layout.
    ExtendableAllows the dimensions of the dataset increase over time and/or to be unlimited.
Requires chunked layout.
    Compressed and
        extendable
Requires chunked layout.

Command-line arguments:
The h5import syntax for the command-line arguments, in_options, is as follows:

h5import infile -d dim_list [-p pathname] [-t input_class] [-s input_size] [infile ...] -o outfile
or
h5import infile -dims dim_list [-path pathname] [-type input_class] [-size input_size] [infile ...] -outfile outfile
or
h5import infile -c config_file [infile ...] -outfile outfile

Note the following:

  • If the -c config_file option is used with an input file, no other argument can be used with that input file.
  • If the -c config_file option is not used with an input data file, the -d dim_list argument (or -dims dim_list) must be used and any combination of the remaining options may be used. Any arguments used must appear in exactly the order used in the syntax declarations immediately above.

The configuration file:
A configuration file is specified with the -c config_file option:

h5import infile -c config_file [infile -c config_file2 ...] -outfile outfile

The configuration file is an ASCII file and must be organized as "Configuration_Keyword Value" pairs, with one pair on each line. For example, the line indicating that the input data class (configuration keyword INPUT-CLASS) is floating point in a text file (value TEXTFP) would appear as follows:
    INPUT-CLASS TEXTFP

A configuration file may have the following keywords each followed by one of the following defined values. One entry for each of the first two keywords, RANK and DIMENSION-SIZES, is required; all other keywords are optional.


Keyword  
    Value

Description

RANK  

The number of dimensions in the dataset. (Required)
    rankAn integer specifying the number of dimensions in the dataset.
Example:   4   for a 4-dimensional dataset.

DIMENSION-SIZES

Sizes of the dataset dimensions. (Required)
    dim_sizesA string of space-separated integers specifying the sizes of the dimensions in the dataset. The number of sizes in this entry must match the value in the RANK entry. The fastest-changing dimension must be listed first.
Example:   4 3 4 38   for a 38x4x3x4 dataset.

PATH

Path of the output dataset.
    path

The full HDF5 pathname identifying the output dataset relative to the root group within the output file.
I.e., path is a string consisting of optional group names, each followed by a slash, and ending with a dataset name. If the groups in the path do no exist, they will be created.
If PATH is not specified, the output dataset is stored as a member of the root group and the default dataset name is dataset0 for the first input dataset, dataset1 for the second input dataset, dataset2 for the third input dataset, etc.
Note that h5import does not overwrite a pre-existing dataset of the specified or default name. When an existing dataset of a conflicting name is encountered, h5import quits with an error; the current input file and any subsequent input files are not processed.
Example: The configuration file entry

    PATH grp1/grp2/dataset1

indicates that the output dataset dataset1 will be written in the group grp2/ which is in the group grp1/, a member of the root group in the output file.


INPUT-CLASS  

A string denoting the type of input data.
    TEXTINInput is signed integer data in an ASCII file.
    TEXTUINInput is unsigned integer data in an ASCII file.
    TEXTFPInput is floating point data in either fixed-point notation (e.g., 325.34) or scientific notation (e.g., 3.2534E+02) in an ASCII file.
    INInput is signed integer data in a binary file.
    UINInput is unsigned integer data in a binary file.
    FPInput is floating point data in a binary file. (Default)
    STRInput is character data in an ASCII file. With this value, the configuration keywords RANK, DIMENSION-SIZES, OUTPUT-CLASS, OUTPUT-SIZE, OUTPUT-ARCHITECTURE, and OUTPUT-BYTE-ORDERwill be ignored.
(Not implemented in this release.)

INPUT-SIZE

An integer denoting the size of the input data, in bits.
    8
    16
    32
    64
For signed and unsigned integer data: TEXTIN, TEXTUIN, IN, or UIN. (Default: 32)
    32
    64
For floating point data: TEXTFP or FP. (Default: 32)

OUTPUT-CLASS  

A string denoting the type of output data.
    INOutput is signed integer data.
(Default if INPUT-CLASS is IN or TEXTIN)
    UINOutput is unsigned integer data.
(Default if INPUT-CLASS is UIN or TEXTUIN)
    FPOutput is floating point data.
(Default if INPUT-CLASS is not specified or is FP or TEXTFP)
    STROutput is character data, to be written as a 1-dimensional array of strings.
(Default if INPUT-CLASS is STR)
(Not implemented in this release.)

OUTPUT-SIZE

An integer denoting the size of the output data, in bits.
    8
    16
    32
    64
For signed and unsigned integer data: IN or UIN. (Default: Same as INPUT-SIZE, else 32)
    32
    64
For floating point data: FP. (Default: Same as INPUT-SIZE, else 32)

OUTPUT-ARCHITECTURE

A string denoting the type of output architecture.
    NATIVE
    STD
    IEEE
    INTEL *
    CRAY *
    MIPS *
    ALPHA *
    UNIX *
See the "Predefined Atomic Types" section in the "HDF5 Datatypes" chapter of the HDF5 User’s Guide for a discussion of these architectures.
Values marked with an asterisk (*) are not implemented in this release.
(Default: NATIVE)

OUTPUT-BYTE-ORDER

A string denoting the output byte order. This entry is ignored if the OUTPUT-ARCHITECTURE is not specified or if it is not specified as IEEE, UNIX, or STD.
    BEBig-endian. (Default)
    LELittle-endian.

The following options are disabled by default, making the default storage properties no chunking, no compression, no external storage, and no extensible dimensions.

CHUNKED-DIMENSION-SIZES

Dimension sizes of the chunk for chunked output data.
    chunk_dimsA string of space-separated integers specifying the dimension sizes of the chunk for chunked output data. The number of dimensions must correspond to the value of RANK.
The presence of this field indicates that the output dataset is to be stored in chunked layout; if this configuration field is absent, the dataset will be stored in contiguous layout.

COMPRESSION-TYPE

Type of compression to be used with chunked storage. Requires that CHUNKED-DIMENSION-SIZESbe specified.
    GZIPGzip compression.
Other compression algorithms are not implemented in this release of h5import.

COMPRESSION-PARAM

Compression level. Required if COMPRESSION-TYPE is specified.
    1 through 9Gzip compression levels: 1 will result in the fastest compression while 9 will result in the best compression ratio.
(Default: 6. The default gzip compression level is 6; not all compression methods will have a default level.)

EXTERNAL-STORAGE

Name of an external file in which to create the output dataset. Cannot be used with CHUNKED-DIMENSIONS-SIZES, COMPRESSION-TYPE, OR MAXIMUM-DIMENSIONS.
    external_file       A string specifying the name of an external file.

MAXIMUM-DIMENSIONS

Maximum sizes of all dimensions. Requires that CHUNKED-DIMENSION-SIZES be specified.
    max_dimsA string of space-separated integers specifying the maximum size of each dimension of the output dataset. A value of -1 for any dimension implies unlimited size for that particular dimension.
The number of dimensions must correspond to the value of RANK.


 

Using h5dump to create input for h5import

h5import can use the output of h5dump as input to create a dataset or file. As in all uses of h5import, an import action is limited to a single dataset with an atomic numeric or text datatype.

h5dump must first create two files: 
    •  A DDL file, which will be used as an h5import configuration file 
    •  A raw data file containing the data to be imported

The DDL file must be generated with the h5dump -p option, to generate properties.

The raw data file may contain either numeric or string data. Numeric data can be imported by this method only if h5dumpwrites it to a binary file. String data must be written with the h5dump -y and --width=1 options, generating a single column of strings without indices.

Two examples follow: The first imports a dataset with a numeric datatype. Note that numeric data requires use of theh5dump -b option to produce a binary data file.

    h5dump -p -d "/int/buin/16-bit" --ddl=binuin16.h5.dmp -o binuin16.h5.bin \
           -b binuin16.h5 
    h5import binuin16.h5.bin -c binuin16.h5.dmp -o new_binuin16.h5 

The second example imports a dataset containing text data. Note that string data requires use of the h5dump -y option to exclude indexes and the h5dump --width=1 option to generate a single column of strings.

    h5dump -p -d "/mytext/data" -O txtstr.h5.dmp -o txtstr.h5.bin            \
           -y --width=1 xtstr.h5 
    h5import txtstr.h5.bin -c txtstr.h5.dmp -o new_txtstr.h5 

 

Options and Parameters:
infile(s)Name of the Input file(s).
in_optionsInput options. Note that while only the -dims argument is required, arguments must used in the order in which they are listed below.
  -d dim_list 
  -dims dim_listInput data dimensions. dim_list is a string of comma-separated numbers with no spaces describing the dimensions of the input data. For example, a 50 x 100 2-dimensional array would be specified as -dims 50,100.
Required argument: if no configuration file is used, this command-line argument is mandatory.
  -p pathname 
  -pathname pathname  
                      
pathname is a string consisting of one or more strings separated by slashes (/) specifying the path of the dataset in the output file. If the groups in the path do no exist, they will be created.
Optional argument: if not specified, the default path is dataset1 for the first input dataset,dataset2 for the second input dataset, dataset3 for the third input dataset, etc.
h5import does not overwrite a pre-existing dataset of the specified or default name. When an existing dataset of a conflicting name is encountered, h5import quits with an error; the current input file and any subsequent input files are not processed.
  -t input_class 
  -type input_class  input_class specifies the class of the input data and determines the class of the output data.
Valid values are as defined in the Keyword/Values table in the section "The configuration file" above.
Optional argument: if not specified, the default value is FP.
  -s input_size 
  -size input_sizeinput_size specifies the size in bits of the input data and determines the size of the output data. 
Valid values for signed or unsigned integers are 8, 16, 32, and 64.
Valid values for floating point data are 32 and 64.
Optional argument: if not specified, the default value is 32.
  -c config_fileconfig_file specifies a configuration file.
This argument replaces all other arguments except infile and -o outfile
  -h 
  -helpPrints the h5import usage summary:
h5import -h[elp], OR
h5import <infile> <options> [<infile> <options>...] -o[utfile] <outfile>

Then exits.
outfileName of the HDF5 output file.

Exit Status:
0Succeeded.
> 0    An error occurred.

Example:

Using command-line arguments:

h5import infile -dims 2,3,4 -type TEXTIN -size 32 -o out1
    This command creates a file out1 containing a single 2x3x4 32-bit integer dataset. Since no pathname is specified, the dataset is stored in out1 as /dataset1.
h5import infile -dims 20,50 -path bin1/dset1 -type FP -size 64 -o out2
    This command creates a file out2 containing a single a 20x50 64-bit floating point dataset. The dataset is stored in out2as /bin1/dset1.

 

Sample configuration files:
The following configuration file specifies the following:
– The input data is a 5x2x4 floating point array in an ASCII file.
– The output dataset will be saved in chunked layout, with chunk dimension sizes of 2x2x2.
– The output datatype will be 64-bit floating point, little-endian, IEEE.
– The output dataset will be stored in outfile at /work/h5/pkamat/First-set.
– The maximum dimension sizes of the output dataset will be 8x8x(unlimited).
            PATH work/h5/pkamat/First-set
            INPUT-CLASS TEXTFP
            RANK 3
            DIMENSION-SIZES 5 2 4
            OUTPUT-CLASS FP
            OUTPUT-SIZE 64
            OUTPUT-ARCHITECTURE IEEE
            OUTPUT-BYTE-ORDER LE
            CHUNKED-DIMENSION-SIZES 2 2 2 
            MAXIMUM-DIMENSIONS 8 8 -1
        
The next configuration file specifies the following:
– The input data is a 6x3x5x2x4 integer array in a binary file.
– The output dataset will be saved in chunked layout, with chunk dimension sizes of 2x2x2x2x2.
– The output datatype will be 32-bit integer in NATIVE format (as the output architecture is not specified).
– The output dataset will be compressed using Gzip compression with a compression level of 7.
– The output dataset will be stored in outfile at /Second-set.
            PATH Second-set
            INPUT-CLASS IN
            RANK 5
            DIMENSION-SIZES 6 3 5 2 4
            OUTPUT-CLASS IN
            OUTPUT-SIZE 32
            CHUNKED-DIMENSION-SIZES 2 2 2 2 2
            COMPRESSION-TYPE GZIP
            COMPRESSION-PARAM 7

History:
Release    Change
1.6.0Tool introduced in this release.
1.8.10Tool updated to accept h5dump output.
1.8.11Process simplified for using h5dump output. See Using h5dump to create input for h5import.
 

--- Last Modified: August 28, 2019 | 09:17 AM