Page tree

The license could not be verified: License Certificate has expired!

 

Fill Value and Dataset Storage Allocation Issues in HDF5

Quincey Koziol
koziol@ncsa.uiuc.edu
October 9, 2002

  1. Document's Audience:

    • Current H5 library designers and knowledgable external developers.
  2. Introduction:

    What is a fill-value?

    A fill-value is the value retrieved for a dataset element in HDF5 when no application data has been written to that element. The fill-value may be stored explicitly in the dataset by HDF5 or it may be implied in some way.


    What does a dataset storage allocation mean?

    Dataset storage allocation is a term used to indicate that space in a file has been reserved for the raw data of a dataset.


    How are fill-values and dataset storage allocation handled currently in HDF4 and HDF5?
    1. HDF4

      These issues are specific to how the SD*() API functions operate in the latest version of HDF4, other portions of the HDF4 library may operate in different ways. Only the normal (i.e. "contiguous") and chunked storage methods are discussed, other storage methods (like external file storage, or linked-block storage) are treated as normal storage in HDF4.

      1. Dataset Storage Allocation

        Allocating space to store a dataset is deferred until the space is needed. Space is only needed when application data is written to a dataset. This allows for very large datasets to be defined, and if they are not written to, the file size can stay very small. This applies to both contiguous and chunked data.

      2. Fill-values
        1. Metadata

          Metadata documenting the fill-value is always written to a file. Either the default fill-value (of zero) or the user's fill-value is written as an attribute of the dataset.

        2. Reading

          If storage for the dataset or chunk is not allocated yet, the fill-value is used to fill the buffer to return to the application and the file data is not read.

        3. Writing

          Fill-values are only written to the dataset or chunk when the entire dataset or chunk is not going to be written in a single I/O request. For example: in a contiguously stored dataset, if a hyperslab in the middle of the dataset is written by the user (and this is the first piece of data to be written to the dataset), fill-values are written to the dataset and then the user's data is written in the hyperslab location. However, if the entire dataset is going to be written in one write call, then the fill-value writing step is skipped, since they would all be immediately over-written with the actual data.

          Note: Writing fill-values in HDF4 can be turned off completely by a user who either "knows" that they will be writing the entire dataset in successive calls, or who doesn't care about data outside the region(s) they are writing to in the dataset.

    2. HDF5

      These issues apply to all datasets in HDF5. Only the contiguous and chunked storage methods are discussed, other storage methods (such as external file storage) are treated as contiguous storage in HDF5.

      1. Dataset Storage Allocation

        Space for contiguously stored data is always allocated during the creation of the dataset. Space for chunk stored data is allocated as needed, when data needs to be written to the portion of the dataset that the chunk occupies. (Except in the case of parallel I/O, where all the chunks for a dataset are allocated at creation time also).

      2. Fill-values
        1. Metadata

          Metadata documenting the fill-value for a dataset is only written out if the user explicitly set a fill-value for the dataset during creation. Although there is an implicit zero fill-value assumed for the dataset, this is not enforced or recorded.

        2. Reading

          Fill-values are only used for chunked storage datasets when an unallocated chunk is read from. Because contiguously stored data always allocates space in the file, the library assumes that there is always valid data to read for contiguous data.

        3. Writing

          Fill-values are only written to contiguously stored data when a dataset is created (and only if the user has set a fill-value). This occurs irregardless of how the fill-values will be overwritten by future writes to the dataset.

          Fill-values for chunked storage data are somewhat more controlled, they are written only when data is actually written to a particular chunk. This occurs irregardless of how the fill-values will be overwritten by future writes to the chunk.


    Why these issues need to be faced now?

    Although we've been aware of differences between the way storage space is allocated in a file and how fill-values are treated between HDF4 and HDF5 for a while, this hasn't been an especially burning problem that needed to be dealt with. Unfortunately, there is a bug with the way that memory for variable-length (VL) data is being leaked in the file when the data elements are overwritten, and it is tied to these storage and fill-value issues.

    Currently, when VL data elements are over-written in a dataset, the space for the previous piece of VL data is not released to the file to be re-used, it is instead leaked and not reused. Because the previous value for the VL data would need to be read from the file dataset in order to be properly released, it ties in with the fill-values stored in the file. (For the current library design, since a heap ID is stored in the dataset for the location of the VL data, not the VL data itself, a heap ID set to all zeros is used to indicate that there is no VL data for a paticular location. So currently, the only valid fill-value for VL data is an all zero value, indicating that no VL data has been stored in the heap.)

    If fill-values are not written to the file, then there is the potential for junk data to be read from the file as the VL data to be released and errors to occur. Currently, the library relies on the filesystem to zero-fill blocks allocated to the file when there is no fill-value set for the dataset. We've already seen this assumption break down under Win9x, where the OS does not zero-fill file blocks with zeros and users report "junk" in datasets which have been created, but not written to.

    So, VL data requires valid fill-values to be present in the file in order to be certain that reading the VL data to be overwritten is valid and contains the correct information to either free the previous VL data (in the case of non-NULL valued VL data) or not to try to free the previous VL data (for NULL valued VL data). Having junk (or the potential for junk) in the data read from the file opens the possibility for corrupting data in the file if that junk data is used to try to free the previous VL data.


    How do fill-values and VL datatypes interact?
    Currently, the only valid fill-value for a VL datatype is specifying an all zero (0) valid, to indicate that there is no VL sequence for an element.

    How do fill-values and composite datatypes interact?
    When a fill-value is stored for a composite datatype (compound, array or variable-length), the value stored in the "new" fill-value header message (detailed below) is exactly the same how the values in the dataset elements are stored.

    How do fill-values and compound datatypes interact?
    It is possible to write to only one (out of potentially many) field in a compound dataset. This has no affect on the operation which fills in the elements of the dataset, as all the elements of a dataset have the fill-value written to them, then they are overwritten with the application values specified, which are only a part of each element in this case.

  3. Design Goals:

    • Provide a method for controlling when and how fill-values are written to a dataset.
    • Provide a method for controlling when space is allocated for storing a dataset.
  4. Primary Users:

    Current HDF5 users
    Existing HDF5 users who are storing VL data and re-writing that data will need to stop leaking file data. Existing HDF5 users who desire more control over how fill-values are written to their datasets and when space is allocated to store their raw date would benefit also.
    New users
    Additionally, there may be other users who have chosen not to use HDF5 due to the lack of the controls available, especially if they are currently using HDF4 and find these features important.
  5. Requirements:

    • The library's performance and stability should not be impacted as a result of these new features.
    • Changes from these new features must operation correctly and efficiently in a parallel programming environment as well as a serial environment.
    • Make as small of a set of changes to the HDF5 file format and programming API as possible by implementing this feature.
  6. Proposed Changes to Library Behavior:

    At the very minimum, to be able to fix the VL data memory leak, valid data should be available for all datasets with VL datatypes. This is handled by requiring a fill value to be written for all datasets with VL datatype. This means that calls to H5Dcreate with a datatype which contains a VL datatype (either directly, or as part of a compound or array datatype) and have set the fill-value to "undefined" will fail.

    We can provide users with three properties to control the fill-value and allocation strategies of the library. They are "when to allocate space", "when to write the fill-value" and the actual fill-value to write.

    Each property is described below:

    • When to allocate space:
      1. Early - during dataset create call. Allocate storage for the dataset immediately when the dataset is created. Certain VFDs (like MPI-I/O and MPI-posix) require space to be allocated when a dataset is created, which will override the setting chosen by a user.
      2. Late - during first write to dataset. Defer allocating space for storing the dataset until the dataset is written to. Choosing incremental allocation for compact dataset storage is an error.
      3. Incremental - during first write to chunk. Defer allocating space for storing each chunk until the chunk is written to. Choosing incremental allocation for contiguous dataset storage is treated as late allocation. Choosing incremental allocation for compact dataset storage is an error.
      4. Default - Allocate storage for the dataset as appropriate for the storage method and access method. The defaults are shown here:
        Serial I/O Parallel I/O
        Contiguous Storage Late Early
        Chunked Storage Incremental Early
        Compact Storage Early Early
    • When to write fill value:
      1. Never - Fill value will never be written to dataset's storage.
      2. Allocation - Fill value is written when space is allocated. This is the default for both chunked and contiguous data storage.
    • What fill value to write:
      1. Undefined - no value stored.
      2. Default - library defined. By default, the library defines a fill-value of all zero bytes (whatever that means for the datatype).
      3. User-defined - user defined value.

    By using these three properties, the library's behavior of fill value writing is listed in the table below during the dataset create-write-close cycle.

    When to allocate space When to write fill value What fill value to write Library create-write-close behavior
    Early Never ----- Library allocates space when dataset is created, but never writes fill value to dataset.
    Late Never ----- Library allocates space when dataset is written to, but never writes fill value to dataset.
    Incremental Never ----- Library allocates space when dataset or chunk (whichever is smallest unit of space) is written to, but never writes fill value to dataset or chunk.
    ----- Allocation undefined Error on creating dataset, dataset not created.
    Early Allocation default or user-defined Allocate space for dataset when dataset is created. Write fill value (default or user-defined) to entire dataset when dataset is created.
    Late Allocation default or user-defined Doesn't allocate space for dataset until user's data values are written to dataset. Write fill value to entire dataset before writing user's data value.
    Incremental Allocation default or user-defined Doesn't allocate space for dataset until user's data values are written to dataset or chunk (whichever is smallest unit of space). Write fill value to entire dataset or chunk before writing user's data value.
    ----- stands for any value.

    During the H5Dread function call, the library behavior depends on whether space has been allocated, whether fill value has been written to storage, how fill value is defined, and when to write fill value.

    Is space allocated? What is the fill value? When to write fill value? Library read behavior
    No undefined Allocation Error. Dataset can't exist, no data has been written, fill value isn't defined.
    Never Error. Data doesn't exist, fill value isn't defined and therefore cannot be used to fill user's buffer.
    default or user-defined ----- Fill user's buffer with fill value.
    Yes undefined ----- Return data from storage(dataset), trash is possible if user has not written data to portion of dataset being read.
    default or user-defined Never Return data from storage(dataset), trash is possible if user has not written data to portion of dataset being read.
    default or user-defined Allocation Return data from storage(dataset).
    ----- stands for any value.

  7. Implementation Plans:

    The work outlined in the document is already finished and checked into the library. This document is describing the rationale for the changes and the exact changes implemented.

  8. Changes Remaining:

    None currently.

  9. Advanced Features:

    It may be possible in the future to specify valid VL information in the fill-value and have the library write that VL information to the file's global heap only once. Then all the references to that VL information in the dataset would share the same VL information, without excessive duplication of the VL information. Care must be taken if this is implemented, to correctly handle the reference counts necessary when re-writing dataset elements currently using the shared value.

    It is possible to optimize the operation which writes the fill-value to a dataset, by only writing the fill-value to the elements which are not going to be overwritten by the applications first write to the dataset (when the fill time property is set to "allocation" and the space allocation property is set to "late"). This will improve performance in cases where the application is writing a significant portion of the dataset. Care must be taken if this is implemented, to correctly handle the cases when an application is only writing part of a compound datatype, however. Additionally, this has extra complexities in a parallel I/O environment, which would have to be carefully handled.

    It is possible to optimize the operation which writes the fill-value to a dataset a bit more, by delaying writing the fill-values to the dataset until the dataset is closed. This could be done by using a selection to build up the regions of the dataset which have been written to and then only write the fill-values to the "inverse" of that region when the dataset is closed. Care must be taken if this is implemented, to correctly handle the cases when an application is only writing part of a compound datatype, however. Additionally, this has extra complexities in a parallel I/O environment, which would have to be carefully handled.

    It is possible to optimize the fill-value I/O situation even further by never writing full fill-values to the dataset. Instead, the regions of the dataset which have been written to by the user are tracked with a selection and the selection is stored with the dataset in the file. Then, when an application attempted to read elements outside that region, the fill-values would be placed directly into the applications buffer, having never been actually stored in the file at all. Care must be taken if this is implemented, to correctly handle the cases when an application is only writing part of a compound datatype, however. Additionally, this has extra complexities in a parallel I/O environment, which would have to be carefully handled.

  10. Alternate Approachs:

    None proposed

  11. File Format Changes:

    The changes in this document require two object header message changes. For the data storage layout message, the "address" value has been changed to express unallocated space. A "new" fill-value message has been added with new fields to store the information contained in the new properties.

    The revised data storage layout message follows, with the only changes being in the description of the "Version" and "Address" fields:

      Name: Data Storage - Layout
      Type: 0x0008
      Length: varies
      Status: Required for datasets, may not be repeated
      Purpose: Data layout describes how the elements of a multi-dimensional array are arranged in the linear address space of the file. Two types of data layout are supported:

      1. The array can be stored in one contiguous area of the file. The layout requires that the size of the array be constant and does not permit chunking, compression, checksums, encryption, etc. The message stores the total size of the array and the offset of an element from the beginning of the storage area is computed as in C.

      2. The array domain can be regularly decomposed into chunks and each chunk is allocated separately. This layout supports arbitrary element traversals, compression, encryption, and checksums, and the chunks can be distributed across external raw data files (these features are described in other messages). The message stores the size of a chunk instead of the size of the entire array; the size of the entire array can be calculated by traversing the B-tree that stores the chunk addresses.

      Format:
      byte byte byte byte
      Version Dimensionality Layout Class Reserved
      Reserved

      Address

      Dimension 0
      Dimension 1
      ...

      Description:
      Field Name Description
      Version A version number for the layout message. This document describes version two (2).
      Dimensionality An array has a fixed dimensionality. This field specifies the number of dimension size fields later in the message.
      Layout Class The layout class specifies how the other fields of the layout message are to be interpreted. A value of one (1) indicates contiguous storage while a value of two (2) indicates chunked storage. Other values will be defined in the future.
      Address For contiguous storage, this is the offset of the first byte of raw data information for the dataset. This offset may contain the value "HADDR_UNDEF" (-1) to indicate the storage space has not been allocated. For chunked storage this is the offset of the B-tree that is used to look up the offsets of the chunks.
      Dimension 0...n For contiguous storage the dimensions define the entire size of the array while for chunked storage they define the size of a single chunk.

    The new fill-value message follows. This is a new object header message, designed to supercede the current fill-value message. The old fill-value message lacked a "Version" field and thus could not be changed to accomodate the new information to be stored. The old fill-value message will still be written out when appropriate, to facilitate forward compatibility with new files being read by old versions of the library.

      Name: Data Storage - New Fill-Value
      Type: 0x0005
      Length: varies
      Status: Optional, may not be repeated
      Purpose: This fill value message stores a single data element value and its related properties - space allocation time, fill-value write time, and whether fill-value is defined. The fill value is is stored and interpreted as having the same datatype as that defined for the elements of the dataset.
      Format:
      byte byte byte byte
      Version Space allocation time Fill-value write time Fill-value defined?
      Size

      Fill-Value


      Description:
      Field Name Description
      Version A version number for the layout message. This document describes version one (1).
      Space allocation time When to allocate storage space. Specifies whether to allocate space when the dataset is created (a value of one (1)), or when application data is written to the dataset (a value of two (2)).
      Fill-value write time When to write fill-value to dataset. A value of zero (0) indicates never to write fill-values; a value of one (1) indicates to write fill value when storage space is allocated for the dataset.
      Fill-value defined? A value of zero (0) means the fill-value is undefined for this dataset; a value of one (1) indicates the fill-value is defined (either default or user-defined). If undefined, the "Size" field will have the value of zero and the "Fill-value" field will not exist.
      Size This the size of the "Fill-value" field in bytes.
      Fill-Value The actual fill-value. The fill-value is interpreted using the same datatype as for the dataset.

  12. Changes to current API Calls:

    Minor changes have been made to existing library API functions:

      H5Dcreate
      This function now returns an error if a user attempts to create a dataset using a datatype which containg a VL dataset (either directly, or as part of a [nested] compound or array datatype) and the fill-value in the dataset creation properties is set to "undefined".

      H5Dwrite
      This function now allocates space for the dataset and optionally fills it with the fill-value, if space has not been allocated for the dataset yet.

      H5Dread
      This function now fills in the users buffer with the fill-value if space has not been allocated for the dataset and a fill-value is defined for the dataset.

    The following API calls have changed significantly:


      Name:
      H5Pset_fill_value
      Purpose:
      Sets the fill-value for a dataset.
      Signature:
      herr_t H5Pset_fill_value(hid_t dcpl_id, hid_t type_id, const void *value)
      Parameters:
      hid_t dcpl_id
      IN: ID of dataset creation property list to operate on.
      hid_t type_id
      IN: ID of datatype describing the value parameter.
      void *value
      IN: Pointer to buffer containing the value to use as fill-value.
      Return Value:
      Returns non-negative on success, negative on failure.
      Description:
      This function sets the fill-value for a dataset creation property list. The datatype of a fill-value used in a dataset creation property list may be different from the datatype of the dataset created, but the two datatypes must be convertible by the library. The library currently uses a default fill-value of all zeros for datasets, interpreted according to whatever the datatype of the dataset is. Passing a value of NULL for the value parameter indicates that the fill-value is to be undefined.

      This function is designed to coordinate with the storage allocation time and fill-value write time properties (set with H5Pset_alloc_time and H5Pset_fill_time).


      Name:
      H5Pget_fill_value
      Purpose:
      Retrievs the fill-value for a dataset.
      Signature:
      herr_t H5Pget_fill_value(hid_t dcpl_id, hid_t type_id, void *value)
      Parameters:
      hid_t dcpl_id
      IN: ID of dataset creation property list to query.
      hid_t type_id
      IN: ID of datatype describing the fill-value to be placed in the value parameter.
      void *value
      OUT: Pointer to buffer where fill-value will be placed.
      Return Value:
      Returns non-negative on success, negative on failure.
      Description:
      This function retrieves the fill-value from a dataset creation property list. The datatype of a fill-value used in a dataset creation property list may be different from the datatype requested in the type_id parameter, but the two datatypes must be convertible by the library. It is an error to query the fill-value when the fill-value is undefined. H5Pfill_value_defined should be used to check for this condition before this function is called.

      This function is designed to coordinate with the storage allocation time and fill-value write time properties (retrieved with H5Pget_alloc_time and H5Pget_fill_time).


  13. New API Calls:

    The following API calls have been implemented:


      Name:
      H5Pset_alloc_time
      Purpose:
      Set the time of data storage allocation for creating a dataset.
      Signature:
      herr_t H5Pset_alloc_time(hid_t dcpl_id, H5D_alloc_time_t alloc_time)
      Parameters:
      hid_t dcpl_id
      IN: ID of dataset creation property list to operate on.
      H5D_alloc_time_t alloc_time
      IN: Time to allocate space for dataset.
      Return Value:
      Returns non-negative on success, negative on failure.
      Description:
      This function indicates when to allocate storage space for a dataset. Valid values for alloc_time are:
        H5D_ALLOC_TIME_DEFAULT
        Use the default time for each storage method to choose when to allocate space for the dataset.
        H5D_ALLOC_TIME_EARLY
        Allocate all space when the dataset is created. (Default for compact stored datasets)
        H5D_ALLOC_TIME_INCR
        Allocate space incrementally as the dataset is written to. Defers allocating space for storing each chunk until a chunk is written to. Choosing incremental allocation for contiguous dataset storage is treated as late allocation. Choosing incremental allocation for compact dataset storage is an error. (Default for chunked storage datasets)
        H5D_ALLOC_TIME_LATE
        Allocate all space when the dataset is first written to. (Default for contiguous storage datasets)

      This function is designed to coordinate with the fill-value write time and fill-value properties (set with H5Pset_fill_time and H5Pset_fill_value).


      Name:
      H5Pget_alloc_time
      Purpose:
      Retrieve the time of data storage allocation for creating a dataset.
      Signature:
      herr_t H5Pget_alloc_time(hid_t dcpl_id, H5D_alloc_time_t *alloc_time)
      Parameters:
      hid_t dcpl_id
      IN: Identifier of dataset creation property list to query.
      H5D_alloc_time_t *alloc_time
      OUT: Time to allocate space for dataset.
      Return Value:
      Returns non-negative on success, negative on failure.
      Description:
      This function indicates when storage space will be allocated for a dataset. Valid values placed in alloc_time are:
        H5D_ALLOC_TIME_DEFAULT
        Use default allocation time, based on dataset storage method.
        H5D_ALLOC_TIME_EARLY
        Allocate all space when the dataset is created.
        H5D_ALLOC_TIME_INCR
        Allocate space incremenentally as the dataset is written to.
        H5D_ALLOC_TIME_LATE
        Allocate all space when the dataset is first written to.

      This function is designed to coordinate with the fill-value write time and fill-value properties (retrieved with H5Pget_fill_time and H5Pget_fill_value).


      Name:
      H5Pset_fill_time
      Purpose:
      Set the time when fill-values are written to a dataset.
      Signature:
      herr_t H5Pset_fill_time(hid_t dcpl_id, H5D_fill_time_t fill_time)
      Parameters:
      hid_t dcpl_id
      IN: ID of dataset creation property list to operate on.
      H5D_fill_time_t fill_time
      IN: When to write fill-values to a dataset.
      Return Value:
      Returns non-negative on success, negative on failure.
      Description:
      This function indicates when to write fill-values to a dataset. Valid values for fill_time are:
        H5D_FILL_TIME_ALLOC
        Write fill-values when the space for dataset is allocated. (Default)
        H5D_FILL_TIME_NEVER
        Never write fill-values to dataset.

      This function is designed to coordinate with the space allocation time and fill-value properties (set with H5Pset_alloc_time and H5Pset_fill_value).


      Name:
      H5Pget_fill_time
      Purpose:
      Retrieve the time when fill-values are written to a dataset.
      Signature:
      herr_t H5Pget_fill_time(hid_t dcpl_id, H5D_fill_time_t *fill_time)
      Parameters:
      hid_t dcpl_id
      IN: ID of dataset creation property list to query.
      H5D_fill_time_t *fill_time
      OUT: When fill-values will be written to a dataset.
      Return Value:
      Returns non-negative on success, negative on failure.
      Description:
      This function indicates fill-values will be written to a dataset. Valid values placed in fill_time are:
        H5D_FILL_TIME_ALLOC
        Write fill-values when the space for dataset is allocated.
        H5D_FILL_TIME_NEVER
        Never write fill-values to dataset.

      This function is designed to coordinate with the space allocation time and fill-value properties (retrieved with H5Pget_alloc_time and H5Pget_fill_value).


      Name:
      H5Pfill_value_defined
      Purpose:
      Check if fill-value is defined.
      Signature:
      herr_t H5Pfill_value_defined(hid_t dcpl_id, H5D_fill_value_t *status)
      Parameters:
      hid_t dcpl_id
      IN: ID of dataset creation property list to query.
      H5D_fill_value_t *status
      OUT: Status of fill-value property in property list.
      Return Value:
      Returns non-negative on success, negative on failure.
      Description:
      This function checks if the fill-value property in the dataset creation property list has been defined. Valid values returned in status are:
        H5D_FILL_VALUE_UNDEFINED
        Fill-value is undefined.
        H5D_FILL_VALUE_DEFAULT
        Fill-value is the library default.
        H5D_FILL_VALUE_USER_DEFINED
        Fill-value is defined by user application.

      This function is designed to coordinate with the space allocation time, fill-value time and fill-value properties (retrieved with H5Pget_alloc_time, H5Pget_fill_time and H5Pget_fill_value).


      Name:
      H5Dget_space_status
      Purpose:
      Check if space is allocated for a dataset
      Signature:
      herr_t H5Dget_space_status(hid_t dset_id, H5D_space_status_t *status)
      Parameters:
      hid_t dset_id
      IN: ID of dataset to query.
      H5D_space_status_t *status
      OUT: Status of space allocation for dataset.
      Return Value:
      Returns non-negative on success, negative on failure.
      Description:
      This function checks if space has been allocated for the dataset. Valid values returned in status are:
        H5D_SPACE_STATUS_NOT_ALLOCATED
        Space has not been allocated for datasets.
        H5D_SPACE_STATUS_ALLOCATED
        Space has been allocated for datasets.
        H5D_SPACE_STATUS_PART_ALLOCATED
        Space has been partially allocated for datasets. (Only returned for datasets using chunked storage).


QAK:10/9/02  

 

 

--- Last Modified: December 11, 2018 | 11:19 AM