NetCDF4 C++ API
|
Parallel I/O requires this software stack:
Build HDF5 with the following configure options:
Build netCDF with the following configure options:
Build the NetCDF4 C++ API library with the following macros defined:
Build your code with -DUSE_PARALLEL and also use -DHAS_NETCDF_PAR_H if the header file "netcdf_par.h" is available.
and link with the NetCDF4.parallel library.
NetCDF version 4 uses the HDF5 library to do handle the I/O and underlying file structure. The netCDF file structure which you define is implemented in terms of HDF5. For parallel I/O, HDF5 uses MPI-IO, which is defined in the MPI2 standard. OpenMPI and MPICH2 provide this capability.
There are two modes of writing in parallel, independent and collective. These are described in the HDF5 documentation:
Independent IO means that each process can do IO independently. It should not depend on or be affected by other processes. Collective IO is a way of doing IO defined as MPI-IO standard; contrary to independent IO, all processes must participate in doing IO.
When using parallel I/O, if a variable is declared as UNLIMITED, the appending records must be done in collective mode. If the write is done in independent mode, the operation will fail with a message such as "HDF Error" or "Unknown error".
Another disadvantage of collective I/O is the elapsed time tends to be slower since all processes have to synchronize their write operation.
The solutions for this issue include:
Compression should not be used for variables which expect parallel writes, per HDF5 restrictions. The HDF5 documentation explains this:
Compression uses chunking. Since chunks are preallocated in the file before writing, chunks have to be of the same size. However, the size of the compressed chunk is not known in advance.
Discussion of compression and chunking is described in http://www.unidata.ucar.edu/software/netcdf/workshops/2010/nc4chunking/index.html and http://www.hdfgroup.org/HDF5/Tutor/compress.html
The HDF5 documentation has a list of potential issues with parallel I/O performance: