An upcoming C++ library for HDF5 files

I recently started working on a new C++ library for reading and writing HDF5 files. I got the idea when I was working on a few files in Python and C++ at the same time. The Python library h5py is just way more comfortable than the HDF5 C++ API. But with the new features in C++11 and C++14, I figured it should be possible to make a C++ library that is just as easy to use as h5py. And I must admit that I believe I’m on the way to making it even easier.

One design goal I have set for this project is to make the library forgiving. This means that I assume you know what you are doing and try to make it possible, even though there might be some side effects.

One example of such a side effect is that of creating a dataset in a HDF5 file and later changing its size. Because of limitations in the HDF5 standard, the original dataset will still take up space in the file, although it’s no longer in use. I believe the HDF Group designed it this way because it is hard to free already allocated space in a file without truncating it first. So they optimized for performance rather than flexibility, which is a fair choice to make. However, I assume you want flexibility and that you’d rather see that extra space taken up than seeing your program crash for trying to reuse a dataset. So this code is perfectly fine within my library:

#include <armadillo>
#include <elegant/hdf5>

using namespace elegant::hdf5;
using namespace arma;

int main() {
    File myFile("myfile.h5");
    myFile["my_dataset"] = zeros(10, 15);
    myFile["my_dataset"] = zeros(20, 25);
    return 0;
}

This opens the myfile.h5 for reading and writing and sets my_dataset to a 10×15 matrix of zeros before resetting it to a 20×25 matrix. This will, however, leave space taken up by the 10×15 matrix used in the file, even though it’s no longer accessible. Oh, and did I mention that the Armadillo library is already supported?

On the contrary, if you try to do the same in h5py, you will get a RuntimeError:

from h5py import *
from pylab import *
my_file = File("myfile.h5")
my_file["my_dataset"] = zeros((10, 15))
my_file["my_dataset"] = zeros((20, 25))
# output:
...
RuntimeError: Unable to create link (Name already exists)

As you can see, the syntax is pretty much the same, but the C++ library will be slightly more forgiving than h5py. That, in addition to a range of other nice features, is what I hope will make this new HDF5 C++ library attractive.

An experimental version will hopefully be released soon.

Published by

Svenn-Arne Dragly

I'm a physicist and programmer, writing about the stuff I figure out as I go.

Leave a Reply