HDF5 Note

This Note is about HDF5.

Basis

The Abstract Data Model

  • The key concepts include:
  • File - a contiguous string of bytes in a computer store (memory, disk, etc.), and the bytes represent zero or more objects of the model
  • Group - a collection of objects (including groups)
  • Dataset - a multidimensional array of data elements with attributes and other metadata
  • Dataspace - a description of the dimensions of a multidimensional array
  • Datatype - a description of a specific class of data element including its storage layout as a pattern of bits
  • Attribute - a named data value associated with a group, dataset, or named datatype
  • Property List - a collection of parameters (some permanent and some transient) controlling options in the library
  • Link - the way objects are connected

File Contents

We can view the HDF5 file contents, simply type:

h5dump <filename>

Creating a Dataset

Datatypes

There are two categories of datatypes in HDF5:

  • Pre-defined: These datatypes are opened and closed by HDF5.

    Pre-defined datatypes can be atomic or composite:

    • Atomic datatypes cannot be decomposed into smaller datatype units at the API level. For example: integer, float, reference, string.
    • Composite datatypes are aggregations of one or more datatypes. For example: array, variable length, enumeration, compound.
  • Derived: These datatypes are created or derived from the pre-defined types.

HDF5 predefined native datatypes

Datasets and Dataspaces

A dataspace describes the dimensionality of the data array. A dataspace is either a regular N- dimensional array of data points, called a simple dataspace, or a more general collection of data points organized in another manner, called a complex dataspace.

  1. Creating a Simple Dataspace
1
2
3
4
5
6
7
8
9
hid_t space_id;
int rank = 2;
hsize_t current_dims[2] = {20, 100};
hsize_t max_dims[2] = {30, H5S_UNLIMITED};
. . .
space_id = H5Screate(H5S_SIMPLE);
H5Sset_extent_simple(space_id,rank,current_dims,max_dims);
// space_id = H5Screate_simple(rank, current_dims, max_dims);
// space_id = H5Screate_simple(rank, current_dims, NULL);  maximum size of all dimensions are the same as the curent sizes
  1. Creating a Null Dataspace
1
2
3
hid_t space_id;
. . .
space_id = H5Screate(H5S_NULL);
  1. Creating a Scalar Dataspace
1
2
3
hid_t space_id;
. . .
space_id = H5Screate(H5S_SCALAR);
  1. Finding Dataspace Characteristics
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
hid_t space_id;
int rank;
hsize_t *current_dims;
hsize_t *max_dims;
---------
rank=H5Sget_simple_extent_ndims(space_id);
(or rank=H5Sget_simple_extent_dims(space_id, NULL, NULL);)
current_dims= (hsize_t)malloc(rank*sizeof(hsize_t));
max_dims=(hsize_t)malloc(rank*sizeof(hsize_t));
H5Sget_simple_extent_dims(space_id, current_dims, max_dims);
Rearranging data

During a read operation, the array will be read into the different shape in memory, and during a write operation, the array will be written to the file in the shape specified by the dataspace in the file.

Data Selection

This can be used to implement partial I/O, including:

  • Sub-setting - reading part of a large dataset
  • Sampling - reading selected elements (for example, every second element) of a dataset
  • Scatter-gather - read non-contiguous elements into contiguous locations (gather) or read contiguous elements into non-contiguous locations (scatter) or both

To use selections, the following steps are followed:

  1. Get or define the dataspace for the source and destination
  2. Specify one or more selections for source and destination dataspaces
  3. Transfer data using the dataspaces with selections

There are two forms of selection, hyperslab and point.

Hyperslab Selection

A hyperslab is a selection of elements from a hyper rectangle.

Select Points

The second type of selection is an array of points such as coordinates.

  1. example 1
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Selecting a hyperslab
* get the file dataspace.
*/
dataspace = H5Dget_space(dataset); /* dataspace */
/* identifier */
/*
* Define hyperslab in the dataset.
*/
offset[0] = 1;
offset[1] = 2;
count[0] = 3;
count[1] = 4;
status = H5Sselect_hyperslab(dataspace, H5S_SELECT_SET, offset, NULL, count, NULL);


// Defining the destination memory
/*
* Define memory dataspace.
*/
dimsm[0] =  7;
dimsm[1] = 7;
dimsm[2] = 3;
memspace  =  H5Screate_simple(3,dimsm,NULL);
/*
* Define memory hyperslab.
*/
offset_out[0] = 3;
offset_out[1] =  0;
offset_out[2] =  0;
count_out[0] =  3;
count_out[1] = 4;
count_out[2] = 1;
status = H5Sselect_hyperslab(memspace, H5S_SELECT_SET, offset_out, NULL, count_out,
NULL);

// read
ret = H5Dread(dataset, H5T_NATIVE_INT, memspace, dataspace, H5P_DEFAULT, data);
  1. example 2
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/* Select hyperslab for the dataset in the file, using
* 3 x 2 blocks, (4,3) stride (2,4) count starting at
* the position (0,1).
*/
offset[0] = 0; offset[1] = 1;
stride[0] = 4; stride[1] = 3;
count[0] = 2; count[1] = 4;
block[0] = 3; block[1] = 2;
ret = H5Sselect_hyperslab(fid, H5S_SELECT_SET, offset, stride, count, block);
/*
* Create dataspace for the first dataset.
*/
mid1 = H5Screate_simple(MSPACE1_RANK, dim1, NULL);
/*
* Select hyperslab.
* We will use 48 elements of the vector buffer starting
* at the second element. Selected elements are
* 1 2 3 . . . 48
*/
offset[0] = 1;
stride[0] = 1;
count[0] = 48;
block[0] = 1;
ret = H5Sselect_hyperslab(mid1, H5S_SELECT_SET, offset, stride, count, block);
/*
* Write selection from the vector buffer to the dataset
* in the file.
*/
ret = H5Dwrite(dataset, H5T_NATIVE_INT, midd1, fid, H5P_DEFAULT, vector);

  1. example 3
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
//  Select source hyperslabs
fid = H5Dget_space(dataset);
/*
* Select first hyperslab for the dataset in the file.
*/
offset[0] = 1; offset[1] = 2;
block[0] = 1; block[1] = 1;
stride[0] = 1; stride[1] = 1;
count[0] = 3; count[1] = 4;
ret = H5Sselect_hyperslab(fid, H5S_SELECT_SET, offset, stride, count, block);
/*
* Add second selected hyperslab to the selection.
*/
offset[0] = 2; offset[1] = 4;
block[0] = 1; block[1] = 1;
stride[0] = 1; stride[1] = 1;
count[0] = 6; count[1] = 5;
ret = H5Sselect_hyperslab(fid, H5S_SELECT_OR, offset, stride, count, block);

//  Select destination hyperslabs
/*
* Create memory dataspace.
*/
mid = H5Screate_simple(MSPACE_RANK, mdim, NULL);
/*
* Select two hyperslabs in memory. Hyperslabs has the
* same size and shape as the selected hyperslabs for
* the file dataspace.
*/
offset[0] = 0; offset[1] = 0;
block[0] = 1; block[1] = 1;
stride[0] = 1; stride[1] = 1;
count[0] = 3; count[1] = 4;
ret = H5Sselect_hyperslab(mid, H5S_SELECT_SET, offset, stride, count, block);
offset[0] = 1; offset[1] = 2;
block[0] = 1; block[1] = 1;
stride[0] = 1; stride[1] = 1;
count[0] = 6; count[1] = 5;
ret = H5Sselect_hyperslab(mid, H5S_SELECT_OR, offset, stride, count, block);
ret = H5Dread(dataset, H5T_NATIVE_INT, mid, fid, H5P_DEFAULT, matrix_out);
  1. example 4
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
hsize_t dim2[] = {4};
int values[] = {53, 59, 61, 67};
/* Array to store selected points from the
* file dataspace
*/
hssize_t coord[4][2];
/*
* Create dataspace for the second dataset.
*/
mid2 = H5Screate_simple(1, dim2, NULL);
/*
* Select sequence of NPOINTS
 points in the file
* dataspace.
*/
coord[0][0] = 0; coord[0][1] = 0;
coord[1][0] = 3; coord[1][1] = 3;
coord[2][0] = 3; coord[2][1] = 5;
coord[3][0] = 5; coord[3][1] = 6;
ret = H5Sselect_elements(fid, H5S_SELECT_SET, NPOINTS, (const hssize_t **)coord);
ret = H5Dwrite(dataset, H5T_NATIVE_INT, mid2, fid, H5P_DEFAULT, values);

Property Lists

Property lists are a mechanism for modifying the default behavior when creating or accessing objects. Property lists are information relevant to the behavior of the library while attributes are relevant to the user’s data and application.

Programming Examples

  1. Create a file
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
hid_t
 file;
 /* declare file identifier */
/*
* Create a new file using H5F_ACC_TRUNC to truncate and overwrite
* any file of the same name, default file creation properties, and
* default file access properties. Then close the file.
*/
file = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
status = H5Fclose(file);
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include "hdf5.h"
#define FILE "dset.h5"

int main() {

  hid_t       file_id, dataset_id, dataspace_id;  /* identifiers */
  hsize_t     dims[2];
  herr_t      status;

  /* Create a new file using default properties. */
  file_id = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

  /* Create the data space for the dataset. */
  dims[0] = 4;
  dims[1] = 6;
  dataspace_id = H5Screate_simple(2, dims, NULL);

  /* Create the dataset. */
  dataset_id = H5Dcreate2(file_id, "/dset", H5T_STD_I32BE, dataspace_id,
                          H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

  /* End access to the dataset and release resources used by it. */
  status = H5Dclose(dataset_id);

  /* Terminate access to the data space. */
  status = H5Sclose(dataspace_id);

  /* Close the file. */
  status = H5Fclose(file_id);
}
  1. Creating and Initializing a Dataset
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
hid_t
 dataset, datatype, dataspace;
 /* declare identifiers */
/*
* Create a dataspace: Describe the size of the array and
* create the dataspace for a fixed-size dataset.
*/
dimsf[0] = NX;
dimsf[1] = NY;
dataspace = H5Screate_simple(RANK, dimsf, NULL);
/*
* Define a datatype for the data in the dataset.
* We will store little endian integers.
*/
datatype = H5Tcopy(H5T_NATIVE_INT);
status = H5Tset_order(datatype, H5T_ORDER_LE);
/*
* Create a new dataset within the file using the defined
* dataspace and datatype and default dataset creation
* properties.
* NOTE: H5T_NATIVE_INT can be used as the datatype if
* conversion to little endian is not needed.
*/
dataset = H5Dcreate(file, DATASETNAME, datatype, dataspace,
H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
  1. Closing a object
1
2
3
H5Tclose(datatype);
H5Dclose(dataset);
H5Sclose(dataspace);
  1. Writing or Read a Dataset to or from a File
1
2
3
4
5
6
/*
* Write the data to the dataset using default transfer properties.
*/
status = H5Dwrite(dataset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, data);

status = H5Dread(dataset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, data);
  1. Declaring a dataspace with unlimited dimensions
1
2
3
4
5
6
7
/* dataset dimensions at creation time */
Hsize_t dims[2] = {3, 3};
hsize_t maxdims[2] = {H5S_UNLIMITED, H5S_UNLIMITED};
/*
* Create the data space with unlimited dimensions.
*/
dataspace = H5Screate_simple(RANK, dims, maxdims);

HDF5 File

File Access Modes

Access Flag Resulting Access Mode
H5F_ACC_EXCL If the file already exists, H5Fcreate fails. If the file does not exist,it is created and opened with read-write access. (Default)
H5F_ACC_TRUNC If the file already exists, the file is opened with read-write access, and new data will overwrite any existing data. If the file does not exist, it is created and opened with read-write access.
H5F_ACC_RDONLY An existing file is opened with read-only access. If the file does not exist, H5Fopen fails. (Default)
H5F_ACC_RDWR An existing file is opened with read-write access. If the file does not exist, H5Fopen fails.
  • By default, H5Fopen opens a file for read-only access; passing H5F_ACC_RDWR allows read- write access to the file.
  • By default, H5Fcreate fails if the file already exists; only passing H5F_ACC_TRUNC allows the truncating of an existing file.

Programming Model for Files

Creating a New File

The programming model for creating a new HDF5 file can be summarized as follows:

  • Define the file creation property list
  • Define the file access property list
  • Create the file
1
file_id = H5Fcreate ("SampleFile.h5", H5F_ACC_EXCL, H5P_DEFAULT, H5P_DEFAULT)

Open a file

1
2
3
4
faplist_id = H5Pcreate (H5P_FILE_ACCESS)
status = H5Pset_fapl_stdio (faplist_id)
file_id = H5Fopen ("SampleFile.h5", H5F_ACC_RDONLY, faplist_id)
status = H5Fclose (file_id)

HDF5 Groups

HDF5 groups are analogous to the directories and folders; HDF5 datasets are analogous to the files.

Programming Model for Groups

Creating a Group

1
2
3
4
5
6
hid_t file;
file = H5Fopen(....);
group = H5Gcreate(file, "/Data", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
group_new1 = H5Gcreate(file, "/Data/Data_new1", H5P_DEFAULT, H5P_DEFAULT,
H5P_DEFAULT);
group_new2 = H5Gcreate(group, "Data_new2", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

Opening a Group and Accessing an Object in that Group

1
2
3
group = H5Gopen(file, "Data", H5P_DEFAULT);
dataset1 = H5Dopen(group, "CData", H5P_DEFAULT);
dataset2 = H5Dopen(file, "/Data/CData", H5P_DEFAULT);

Creating a Dataset in a Specific Group

1
2
3
4
5
6
7
8
dataspace = H5Screate_simple(RANK, dims, NULL);
dataset1 = H5Dcreate(file, "/Data/CData", H5T_NATIVE_INT,
dataspace, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
group = H5Gopen(file, "Data", H5P_DEFAULT);
dataset2 = H5Dcreate(group, "Cdata2", H5T_NATIVE_INT,
dataspace, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
herr_t status;
status = H5Gclose(group);

HDF5 Datasets

Programming Model for Datasets

Create Dataset

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
hid_t dataset, datatype, dataspace;
/*
* Create dataspace: Describe the size of the array and
* create the dataspace for fixed-size dataset.
*/
dimsf[0] = 7;
dimsf[1] = 8;
dataspace = H5Screate_simple(2, dimsf, NULL);
/*
* Define datatype for the data in the file.
* For this example, store little-endian integer numbers.
*/
datatype = H5Tcopy(H5T_NATIVE_INT);
status = H5Tset_order(datatype, H5T_ORDER_LE);
/*
* Create a new dataset within the file using defined
* dataspace and datatype. No properties are set.
*/
dataset = H5Dcreate(file, "/dset", datatype, dataspace,
H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
H5Dclose(dataset);
H5Sclose(dataspace);
H5Tclose(datatype);
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
hid_t dataset, datatype, dataspace;
hid_t plist; /* property list */
int fillval = -1;
dimsf[0] = 7;
dimsf[1] = 8;
dataspace = H5Screate_simple(2, dimsf, NULL);
datatype = H5Tcopy(H5T_NATIVE_INT);
status = H5Tset_order(datatype, H5T_ORDER_LE);
/*
* Example of Dataset Creation property list: set fill value
* to '-1'
*/
plist = H5Pcreate(H5P_DATASET_CREATE);
status = H5Pset_fill_value(plist, datatype, &fillval);
/* Same as above, but use the property list */
dataset = H5Dcreate(file, "/dset", datatype, dataspace,
H5P_DEFAULT, plist, H5P_DEFAULT);
H5Dclose(dataset);
H5Sclose(dataspace);
H5Tclose(datatype);
H5Pclose(plist);

Data Transfer Operations on a Dataset

A data transfer has the following basic steps:

  1. Allocate and initialize memory space as needed
  2. Define the datatype of the memory elements
  3. Define the elements to be transferred (a selection, or all the elements)
  4. Set data transfer properties (including parameters for filters or file drivers) as needed
  5. Call the H5D API
example
  1. Write an array of integers
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
hid_t  file_id, dataset_id; /* identifiers */
herr_t  status;
int i, j, dset_data[4][6];
/* Initialize the dataset. */
for (i = 0; i < 4; i++)
    for (j = 0; j < 6; j++)
        dset_data[i][j] = i * 6 + j + 1;
/* Open an existing file. */
file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);
/* Open an existing dataset. */
dataset_id = H5Dopen(file_id, "/dset", H5P_DEFAULT);
/* Write the entire dataset, using 'dset_data':
memory type is 'native int'
write the entire dataspace to the entire dataspace,
no transfer properties,
*/
status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL,
H5S_ALL, H5P_DEFAULT, dset_data);
status = H5Dclose(dataset_id);
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
hid_t file_id, dataset_id;
hid_t xferplist;
herr_t status;
int i, j, dset_data[4][6];
file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);
dataset_id = H5Dopen(file_id, "/dset", H5P_DEFAULT);
/*
* Example: set type conversion buffer to 64MB
*/
xferplist = H5Pcreate(H5P_DATASET_XFER);
status = H5Pset_buffer( xferplist, 64 * 1024 *1024, NULL, NULL);
/* Write the entire dataset, using 'dset_data':
memory type is 'native int'
write the entire dataspace to the entire dataspace,
set the buffer size with the property list,
*/
status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL,
H5S_ALL, xferplist, dset_data);
status = H5Dclose(dataset_id);

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
hid_t file_id, dataset_id;
herr_t status;
int i, j, dset_data[4][6];
/* Open an existing file. */
file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);
/* Open an existing dataset. */
dataset_id = H5Dopen(file_id, "/dset", H5P_DEFAULT);
/* read the entire dataset, into 'dset_data':
memory type is 'native int'
read the entire dataspace to the entire dataspace,
no transfer properties,
*/
status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL,
H5S_ALL, H5P_DEFAULT, dset_data);
status = H5Dclose(dataset_id);

Retrieve the Properties of a Dataset

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
hid_t file_id, dataset_id;
hid_t dspace_id, dtype_id, plist_id;
herr_t status;

/* Open an existing file. */
file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);
/* Open an existing dataset. */
dataset_id = H5Dopen(file_id, "/dset", H5P_DEFAULT);
dspace_id = H5Dget_space(dataset_id);
dtype_id = H5Dget_type(dataset_id);
plist_id = H5Dget_create_plist(dataset_id);
/* use the objects to discover the properties of the dataset */
status = H5Dclose(dataset_id);

HDF5 Attributes

An HDF5 attribute is a small metadata object describing the nature and/or intended usage of a primary data object. A primary data object may be a dataset, group, or committed datatype.

Chunking in HDF5

Parallel IO Examples

example 1: parallel write

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <iostream>
#include "hdf5.h"
#include "mpi.h"

using namespace std;

int main(int argc, char **argv) {
  int mpi_size, mpi_rank;
  MPI_Comm comm = MPI_COMM_WORLD;
  MPI_Info info = MPI_INFO_NULL;
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(comm, &mpi_rank);
  MPI_Comm_size(comm, &mpi_size);

  hid_t plist_id, file_id;
  plist_id = H5Pcreate(H5P_FILE_ACCESS);
  H5Pset_fapl_mpio(plist_id, comm, info); // Stores MPI IO communicator information to the file access property list.

  file_id = H5Fcreate("parallel.h5", H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);
  H5Pclose(plist_id);

  hsize_t dims[2];
  hid_t dataspace, dataset_id;
  dims[0] = 10;
  dims[1] = 12;
  int rank = 2;
  dataspace = H5Screate_simple(rank, dims, NULL); // create dataspace
  dataset_id = H5Dcreate(file_id, "X", H5T_NATIVE_INT, dataspace, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
  H5Sclose(dataspace);

  hsize_t count[2], offset[2];
  hid_t mem_space, file_space;
  count[0] = dims[0] / mpi_size;
  count[1] = dims[1];
  offset[0] = mpi_rank * count[0];
  offset[1] = 0;
  mem_space = H5Screate_simple(rank, count, NULL);

  file_space = H5Dget_space(dataset_id);
  H5Sselect_hyperslab(file_space, H5S_SELECT_SET, offset, NULL, count, NULL);

  int* data = new int [static_cast<int>(count[0] * count[1])];
  for (int i = 0; i < static_cast<int>(count[0] * count[1]); i++)
    data[i] = mpi_rank;

  plist_id = H5Pcreate(H5P_DATASET_XFER);
  H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE);
  H5Dwrite(dataset_id, H5T_NATIVE_INT, mem_space, file_space, plist_id, data);

  H5Dclose(dataset_id);
  H5Sclose(mem_space);
  H5Sclose(file_space);
  H5Pclose(plist_id);
  H5Fclose(file_id);

  MPI_Finalize();
  return 0;
}

example 2: parallel read

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#include <iostream>
#include "hdf5.h"
#include "mpi.h"

using namespace std;

int main(int argc, char **argv) {
  int mpi_size, mpi_rank;
  MPI_Comm comm = MPI_COMM_WORLD;
  MPI_Info info = MPI_INFO_NULL;
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(comm, &mpi_rank);
  MPI_Comm_size(comm, &mpi_size);

  hid_t file_access, file_id;
  file_access = H5Pcreate(H5P_FILE_ACCESS);
  H5Pset_fapl_mpio(file_access, comm, info);
  char filename[] = "parallel.h5";
  file_id = H5Fopen(filename, H5F_ACC_RDWR, file_access);
  H5Pclose(file_access);

  hid_t dataset, dataset_space;
  dataset = H5Dopen2(file_id, "X", H5P_DEFAULT);
  dataset_space = H5Dget_space(dataset);
  int rank;
  hsize_t *current_dims;
  hsize_t *max_dims;
  rank = H5Sget_simple_extent_ndims(dataset_space);   // get the dataset information
  current_dims= (hsize_t*)malloc(rank*sizeof(hsize_t));
  max_dims=(hsize_t*)malloc(rank*sizeof(hsize_t));
  H5Sget_simple_extent_dims(dataset_space, current_dims, max_dims);

  hsize_t start[2], count[2], stride[2];
  start[0] = mpi_rank * current_dims[0] / mpi_size;
  start[1] = 0;
  count[0] = current_dims[0] / mpi_size;
  count[1] = current_dims[1];
  stride[0] = 1;
  stride[1] =1;
  H5Sselect_hyperslab(dataset_space, H5S_SELECT_SET, start, stride, count, NULL);

  hid_t mem_dataspace;
  mem_dataspace = H5Screate_simple(2, count, NULL);

  // int** data = new int*[2];
  // int *array = new int [2 * 12];
  // for(int i = 0; i < 2; i++)
  //   data[i] = &array[i * 12];
  // H5Dread(dataset, H5T_NATIVE_INT, mem_dataspace, dataset_space, H5P_DEFAULT, array);

  int data[2][12];
  H5Dread(dataset, H5T_NATIVE_INT, mem_dataspace, dataset_space, H5P_DEFAULT, data);

  for (int i = 0; i < mpi_size; i++) {
    if (mpi_rank == i) {
      cout << "rank " << mpi_rank << " :" << endl;
      for (int i = 0; i < 2; i++) {
        for (int j = 0; j < 12; j++) {
          cout << data[i][j] << " ";
        }
        cout << endl;
      }
    }
    MPI_Barrier(comm);
  }

  H5Sclose(mem_dataspace);
  H5Sclose(dataset_space);
  H5Dclose(dataset);
  H5Fclose(file_id);
}

Useful Examples

example 1: Write all

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
   hid_t       file_id, dataset_id;  /* identifiers */
   herr_t      status;
   int         i, j, dset_data[4][6];

   /* Initialize the dataset. */
   for (i = 0; i < 4; i++)
      for (j = 0; j < 6; j++)
         dset_data[i][j] = i * 6 + j + 1;

   /* Open an existing file. */
   file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);

   /* Open an existing dataset. */
   dataset_id = H5Dopen(file_id, "/dset");

   /* Write the entire dataset, using 'dset_data':
         memory type is 'native int'
         write the entire dataspace to the entire dataspace,
         no transfer properties,
    */
   status = H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL,
           H5S_ALL, H5P_DEFAULT, dset_data);
   status = H5Dclose(dataset_id);

example 2: read all

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
   Hid_t       file_id, dataset_id;
   herr_t      status;
   int         i, j, dset_data[4][6];


   /* Open an existing file. */
   file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);

   /* Open an existing dataset. */
   dataset_id = H5Dopen(file_id, "/dset");

   /* read the entire dataset, into 'dset_data':
         memory type is 'native int'
         read the entire dataspace to the entire dataspace,
         no transfer properties,
    */
   status = H5Dread(dataset_id, H5T_NATIVE_INT, H5S_ALL,
           H5S_ALL, H5P_DEFAULT, dset_data);

   status = H5Dclose(dataset_id);

example 3: Extend dataset

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
   hid_t       file_id, dataset_id;
   Herr_t      status;
   size_t        newdims[2];


   /* Open an existing file. */
   file_id = H5Fopen("dset.h5", H5F_ACC_RDWR, H5P_DEFAULT);

   /* Open an existing dataset. */
   dataset_id = H5Dopen(file_id, "/dset");

   /* Example:  dataset is 2 X 3, each dimension is UNLIMITED */
   /* extend to 2 X 7 */
   newdims[0] = 2;
   newdims[1] = 7;

   status = H5Dextend(dataset_id, newdims);

   /* dataset is now 2 X 7 */

   status = H5Dclose(dataset_id);

example 4: create and extend an unlimited dataset

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
#include "hdf5.h"
#include <stdio.h>
#include <stdlib.h>

#define FILE            "h5ex_d_unlimadd.h5"
#define DATASET         "DS1"
#define DIM0            4
#define DIM1            7
#define EDIM0           6
#define EDIM1           10
#define CHUNK0          4
#define CHUNK1          4

int
main (void)
{
    hid_t           file, space, dset, dcpl;    /* Handles */
    herr_t          status;
    hsize_t         dims[2] = {DIM0, DIM1},
                    extdims[2] = {EDIM0, EDIM1},
                    maxdims[2],
                    chunk[2] = {CHUNK0, CHUNK1},
                    start[2],
                    count[2];
    int             wdata[DIM0][DIM1],          /* Write buffer */
                    wdata2[EDIM0][EDIM1],       /* Write buffer for
                                                   extension */
                    **rdata,                    /* Read buffer */
                    ndims,
                    i, j;

    /*
     * Initialize data.
     */
    for (i=0; i<DIM0; i++)
        for (j=0; j<DIM1; j++)
            wdata[i][j] = i * j - j;

    /*
     * Create a new file using the default properties.
     */
    file = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

    /*
     * Create dataspace with unlimited dimensions.
     */
    maxdims[0] = H5S_UNLIMITED;
    maxdims[1] = H5S_UNLIMITED;
    space = H5Screate_simple (2, dims, maxdims);

    /*
     * Create the dataset creation property list, and set the chunk
     * size.
     */
    dcpl = H5Pcreate (H5P_DATASET_CREATE);
    status = H5Pset_chunk (dcpl, 2, chunk);

    /*
     * Create the unlimited dataset.
     */
    dset = H5Dcreate (file, DATASET, H5T_STD_I32LE, space, H5P_DEFAULT, dcpl,
                H5P_DEFAULT);

    /*
     * Write the data to the dataset.
     */
    status = H5Dwrite (dset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
                wdata[0]);

    /*
     * Close and release resources.
     */
    status = H5Pclose (dcpl);
    status = H5Dclose (dset);
    status = H5Sclose (space);
    status = H5Fclose (file);


    /*
     * In this next section we read back the data, extend the dataset,
     * and write new data to the extended portions.
     */

    /*
     * Open file and dataset using the default properties.
     */
    file = H5Fopen (FILE, H5F_ACC_RDWR, H5P_DEFAULT);
    dset = H5Dopen (file, DATASET, H5P_DEFAULT);

    /*
     * Get dataspace and allocate memory for read buffer.  This is a
     * two dimensional dataset so the dynamic allocation must be done
     * in steps.
     */
    space = H5Dget_space (dset);
    ndims = H5Sget_simple_extent_dims (space, dims, NULL);

    /*
     * Allocate array of pointers to rows.
     */
    rdata = (int **) malloc (dims[0] * sizeof (int *));

    /*
     * Allocate space for integer data.
     */
    rdata[0] = (int *) malloc (dims[0] * dims[1] * sizeof (int));

    /*
     * Set the rest of the pointers to rows to the correct addresses.
     */
    for (i=1; i<dims[0]; i++)
        rdata[i] = rdata[0] + i * dims[1];

    /*
     * Read the data using the default properties.
     */
    status = H5Dread (dset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
                rdata[0]);

    /*
     * Output the data to the screen.
     */
    printf ("Dataset before extension:\n");
    for (i=0; i<dims[0]; i++) {
        printf (" [");
        for (j=0; j<dims[1]; j++)
            printf (" %3d", rdata[i][j]);
        printf ("]\n");
    }

    status = H5Sclose (space);

    /*
     * Extend the dataset.
     */
    status = H5Dset_extent (dset, extdims);

    /*
     * Retrieve the dataspace for the newly extended dataset.
     */
    space = H5Dget_space (dset);

    /*
     * Initialize data for writing to the extended dataset.
     */
    for (i=0; i<EDIM0; i++)
        for (j=0; j<EDIM1; j++)
            wdata2[i][j] = j;

    /*
     * Select the entire dataspace.
     */
    status = H5Sselect_all (space);

    /*
     * Subtract a hyperslab reflecting the original dimensions from the
     * selection.  The selection now contains only the newly extended
     * portions of the dataset.
     */
    start[0] = 0;
    start[1] = 0;
    count[0] = dims[0];
    count[1] = dims[1];
    status = H5Sselect_hyperslab (space, H5S_SELECT_NOTB, start, NULL, count,
                NULL);

    /*
     * Write the data to the selected portion of the dataset.
     */
    status = H5Dwrite (dset, H5T_NATIVE_INT, H5S_ALL, space, H5P_DEFAULT,
                wdata2[0]);

    /*
     * Close and release resources.
     */
    free (rdata[0]);
    free(rdata);
    status = H5Dclose (dset);
    status = H5Sclose (space);
    status = H5Fclose (file);


    /*
     * Now we simply read back the data and output it to the screen.
     */

    /*
     * Open file and dataset using the default properties.
     */
    file = H5Fopen (FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
    dset = H5Dopen (file, DATASET, H5P_DEFAULT);

    /*
     * Get dataspace and allocate memory for the read buffer as before.
     */
    space = H5Dget_space (dset);
    ndims = H5Sget_simple_extent_dims (space, dims, NULL);
    rdata = (int **) malloc (dims[0] * sizeof (int *));
    rdata[0] = (int *) malloc (dims[0] * dims[1] * sizeof (int));
    for (i=1; i<dims[0]; i++)
        rdata[i] = rdata[0] + i * dims[1];

    /*
     * Read the data using the default properties.
     */
    status = H5Dread (dset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT,
                rdata[0]);

    /*
     * Output the data to the screen.
     */
    printf ("\nDataset after extension:\n");
    for (i=0; i<dims[0]; i++) {
        printf (" [");
        for (j=0; j<dims[1]; j++)
            printf (" %3d", rdata[i][j]);
        printf ("]\n");
    }

    /*
     * Close and release resources.
     */
    free (rdata[0]);
    free(rdata);
    status = H5Dclose (dset);
    status = H5Sclose (space);
    status = H5Fclose (file);

    return 0;
}

example 5: write with increase dataset

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <iostream>
#include "hdf5.h"

using namespace std;

void fill_array(int len, int val, int *array) {
  for (int i = 0; i < len; i++)
    array[i] = val + i;
}

int main(int argc, char **argv) {

  hid_t file_id;
  file_id = H5Fcreate("parallel2.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

  hsize_t dims[2] = {2, 12}, max_dims[2] = {H5S_UNLIMITED, H5S_UNLIMITED};
  hsize_t chunk[2] = {2, 12};
  hid_t dataspace, dataset_id, dcpl;

  int rank = 2;
  dataspace = H5Screate_simple(rank, dims, max_dims); // create dataspace
  dcpl = H5Pcreate(H5P_DATASET_CREATE);
  H5Pset_chunk(dcpl, 2, chunk);
  dataset_id = H5Dcreate2(file_id, "/X", H5T_NATIVE_INT, dataspace, H5P_DEFAULT, dcpl, H5P_DEFAULT);
  H5Sclose(dataspace);
  H5Pclose(dcpl);

  int times = 6;
  hsize_t count[2], offset[2];
  hid_t dataset_dataspace = H5Dget_space(dataset_id), mem_dataspace;
  int data[24];
  for (int i = 0; i < times; i++) {
    fill_array(24, i + 3, data);
    if (i == 0) {
      H5Dwrite(dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, data);
    } else {
      dims[0] += 2;
      H5Dset_extent(dataset_id, dims);
      dataset_dataspace = H5Dget_space(dataset_id);
      count[0] = 2;
      count[1] = 12;
      offset[0] = dims[0] - 2;
      offset[1] = 0;
      mem_dataspace = H5Screate_simple(2, count, NULL);
      H5Sselect_hyperslab(dataset_dataspace, H5S_SELECT_SET, offset, NULL, count, NULL);
      H5Dwrite(dataset_id, H5T_NATIVE_INT, mem_dataspace, dataset_dataspace, H5P_DEFAULT, data);
    }
  }


  H5Dclose(dataset_id);
  H5Sclose(dataset_dataspace);
  H5Fclose(file_id);

  return 0;
}

example 6: Append to a file

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
void CreateFile() {
  MatrixD x;
  SampleGenerator generator;
  int sample_size = 20, n_features = 1000;
  VectorD coef, y;
  generator.MakeRegression(x, y, coef, sample_size, n_features);

  const int MAX = 1e4;
  string hdf5_file_address = "extend.h5";
  hid_t file_id;
  file_id = H5Fcreate(hdf5_file_address.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

  hsize_t x_dims[2] = {sample_size, static_cast<hsize_t>(n_features)}, max_dims[2] = {H5S_UNLIMITED, H5S_UNLIMITED};
  hsize_t chunk[2] = {MAX, static_cast<hsize_t>(n_features)};
  hid_t X_space, X_id, Y_space, Y_id, X_dcpl, Y_dcpl, X_mem_space, Y_mem_space;
  int rank = 2;
  X_space = H5Screate_simple(rank, x_dims, max_dims);
  X_dcpl = H5Pcreate(H5P_DATASET_CREATE);
  H5Pset_chunk(X_dcpl, 2, chunk);
  X_id = H5Dcreate2(file_id, "/X", H5T_NATIVE_DOUBLE, X_space, H5P_DEFAULT, X_dcpl, H5P_DEFAULT);
  H5Pclose(X_dcpl);
  H5Dwrite(X_id, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, x.data());

  H5Sclose(X_space);
  H5Dclose(X_id);
  H5Fclose(file_id);
}

void ExtendFile() {
  MatrixD x;
  SampleGenerator generator;
  int sample_size = 1000, n_features = 1000;
  VectorD coef, y;
  generator.MakeRegression(x, y, coef, sample_size, n_features);
  hid_t file_id, faplist_id, X_id, X_space, X_mem_space;
  string hdf5_file_address = "extend.h5";
  faplist_id = H5Pcreate (H5P_FILE_ACCESS);
  H5Pset_fapl_stdio (faplist_id);
  file_id = H5Fopen (hdf5_file_address.c_str(), H5F_ACC_RDWR, faplist_id);

  X_id = H5Dopen2(file_id, "/X", H5P_DEFAULT);
  X_space = H5Dget_space(X_id);
  hsize_t X_dims[2], x_start[2], x_count[2];
  x_start[1] = 0;
  x_count[1] = n_features;
  H5Sget_simple_extent_ndims(X_space);
  H5Sget_simple_extent_dims(X_space, X_dims, NULL);
  x_start[0] = X_dims[0];
  X_dims[0] += sample_size;
  H5Dset_extent(X_id, X_dims);
  X_space = H5Dget_space(X_id);
  x_count[0] = sample_size;
  int rank = 2;
  X_mem_space = H5Screate_simple(rank, x_count, NULL);
  H5Sselect_hyperslab(X_space, H5S_SELECT_SET, x_start, NULL, x_count, NULL);
  H5Dwrite(X_id, H5T_NATIVE_DOUBLE, X_mem_space, X_space, H5P_DEFAULT, x.data());

  H5Sclose(X_space);
  H5Sclose(X_mem_space);
  H5Dclose(X_id);
  H5Fclose(file_id);
}

void ExtendFile2() {
  MatrixD x;
  SampleGenerator generator;
  int sample_size = 1000, n_features = 1000;
  VectorD coef, y;

  hid_t file_id, faplist_id, X_id, X_space, X_mem_space;
  string hdf5_file_address = "extend.h5";
  faplist_id = H5Pcreate (H5P_FILE_ACCESS);
  H5Pset_fapl_stdio (faplist_id);
  file_id = H5Fopen (hdf5_file_address.c_str(), H5F_ACC_RDWR, faplist_id);

  int time = 13000;
  for (int i = 0; i < time; i++) {
    generator.MakeRegression(x, y, coef, sample_size, n_features);
    X_id = H5Dopen2(file_id, "/X", H5P_DEFAULT);
    X_space = H5Dget_space(X_id);
    hsize_t X_dims[2], x_start[2], x_count[2];
    x_start[1] = 0;
    x_count[1] = n_features;
    H5Sget_simple_extent_ndims(X_space);
    H5Sget_simple_extent_dims(X_space, X_dims, NULL);
    x_start[0] = X_dims[0];
    X_dims[0] += sample_size;
    H5Dset_extent(X_id, X_dims);
    X_space = H5Dget_space(X_id);
    x_count[0] = sample_size;
    int rank = 2;
    X_mem_space = H5Screate_simple(rank, x_count, NULL);
    H5Sselect_hyperslab(X_space, H5S_SELECT_SET, x_start, NULL, x_count, NULL);
    H5Dwrite(X_id, H5T_NATIVE_DOUBLE, X_mem_space, X_space, H5P_DEFAULT, x.data());
  }

  H5Sclose(X_space);
  H5Sclose(X_mem_space);
  H5Dclose(X_id);
  H5Fclose(file_id);
}

void ExtendFile_mpi() {

  MatrixD x;
  SampleGenerator generator;
  int sample_size = 500, n_features = 1000;
  VectorD coef, y;
  generator.MakeRegression(x, y, coef, sample_size, n_features);

  int mpi_size, mpi_rank;
  MPI_Comm comm = MPI_COMM_WORLD;
  MPI_Comm_rank(comm, &mpi_rank);
  MPI_Comm_size(comm, &mpi_size);

  string file_address = "extend.h5";
  hid_t file_access, file_id, plist_id;
  file_access = H5Pcreate(H5P_FILE_ACCESS);
  H5Pset_fapl_mpio(file_access, comm, MPI_INFO_NULL);
  file_id = H5Fopen(file_address.c_str(), H5F_ACC_RDWR, file_access);
  H5Pclose(file_access);

  hid_t X_id, X_space, mem_space;
  X_id = H5Dopen2(file_id, "/X", H5P_DEFAULT);
  X_space = H5Dget_space(X_id);

  hsize_t X_dims[2], x_start[2], x_count[2], offset[2];
  x_start[1] = 0;
  x_count[1] = n_features;
  H5Sget_simple_extent_ndims(X_space);
  H5Sget_simple_extent_dims(X_space, X_dims, NULL);
  x_start[0] = X_dims[0] + mpi_rank * sample_size;
  X_dims[0] += sample_size * mpi_size;
  H5Dset_extent(X_id, X_dims);
  X_space = H5Dget_space(X_id);
  x_count[0] = sample_size;
  int rank = 2;
  mem_space = H5Screate_simple(rank, x_count, NULL);
  H5Sselect_hyperslab(X_space, H5S_SELECT_SET, x_start, NULL, x_count, NULL);
  // MPI_Barrier(comm);

  plist_id = H5Pcreate(H5P_DATASET_XFER);
  H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE);
  H5Dwrite(X_id, H5T_NATIVE_DOUBLE, mem_space, X_space, plist_id, x.data());
  // MPI_Barrier(comm);

  H5Dclose(X_id);
  H5Sclose(mem_space);
  H5Sclose(X_space);
  H5Pclose(plist_id);
  H5Fclose(file_id);
}

Reference

  1. Chunking
  2. HDF5 Reference Manual
updatedupdated2021-11-062021-11-06