An upcoming C++ library for HDF5 files

I recently started working on a new C++ library for reading and writing HDF5 files. I got the idea when I was working on a few files in Python and C++ at the same time. The Python library h5py is just way more comfortable than the HDF5 C++ API. But with the new features in C++11 and C++14, I figured it should be possible to make a C++ library that is just as easy to use as h5py. And I must admit that I believe I’m on the way to making it even easier.

One design goal I have set for this project is to make the library forgiving. This means that I assume you know what you are doing and try to make it possible, even though there might be some side effects.

One example of such a side effect is that of creating a dataset in a HDF5 file and later changing its size. Because of limitations in the HDF5 standard, the original dataset will still take up space in the file, although it’s no longer in use. I believe the HDF Group designed it this way because it is hard to free already allocated space in a file without truncating it first. So they optimized for performance rather than flexibility, which is a fair choice to make. However, I assume you want flexibility and that you’d rather see that extra space taken up than seeing your program crash for trying to reuse a dataset. So this code is perfectly fine within my library:

#include <armadillo>
#include <elegant/hdf5>

using namespace elegant::hdf5;
using namespace arma;

int main() {
    File myFile("myfile.h5");
    myFile["my_dataset"] = zeros(10, 15);
    myFile["my_dataset"] = zeros(20, 25);
    return 0;
}

This opens the myfile.h5 for reading and writing and sets my_dataset to a 10×15 matrix of zeros before resetting it to a 20×25 matrix. This will, however, leave space taken up by the 10×15 matrix used in the file, even though it’s no longer accessible. Oh, and did I mention that the Armadillo library is already supported?

On the contrary, if you try to do the same in h5py, you will get a RuntimeError:

from h5py import *
from pylab import *
my_file = File("myfile.h5")
my_file["my_dataset"] = zeros((10, 15))
my_file["my_dataset"] = zeros((20, 25))
# output:
...
RuntimeError: Unable to create link (Name already exists)

As you can see, the syntax is pretty much the same, but the C++ library will be slightly more forgiving than h5py. That, in addition to a range of other nice features, is what I hope will make this new HDF5 C++ library attractive.

An experimental version will hopefully be released soon.

Straight from the source: NEURON’s incredible backwards compatibility

NEURON is a neural simulator, but it’s not just another neural network application. NEURON has been around much longer than your fancy deep learning software. It was first released in 1994, although some of it’s early code and concepts seem to stem from the 70’s and 80’s. In fact, NEURON is not a machine learning library at all. It simulates real neurons. It uses real morphologies and ion channel densities to reproduce experiments. Some labs even use machine learning algorithms to fit the parameters of their NEURON models. NEURON is one of the workhorses of the computational branch of the Human Brain Project and the core engine in the Blue Brain Project.

But the age of NEURON is starting to show in its source code. Because some things are a bit awkward when working with this simulator I decided to peek at its sources. Just to see what this old engine looks like on the inside. And it turned out to be an interesting journey. Deep inside NEURON’s plotting library, I found this:

#if VT125
case VT:
vtplot(mode, x, y);
break;
#endif

Now, I didn’t know what VT125 was when I first saw this, but a quick search on the web reminded me that I’m still a young software developer. I present to you, the VT100:

VT100 terminal
I couldn’t find a good picture of the VT125. I apologize if it looks way more modern than its older cousin.

Now that’s what I call backwards compatibility!

Some of you may want to lecture me about how modern terminals actually emulate VT100s and that this code might be in there to allow some fancy plotting inside xterm or its siblings. But I would argue that there are other parts of this code that give away its impressive attempt at supporting older platforms. Such as the rest of the above switch statement:

		switch (graphdev)
		{
		case SSUN:
#if SUNCORE
			hoc_sunplot(&amp;amp;amp;amp;text, mode, x, y);
			break;
#else
#if NRNOC_X11
			hoc_x11plot(mode,x,y);
			break;
#else
#if NeXTstep
			break;
		case NX:
			hoc_NeXTplot(mode,x,y);
			break;
#endif
#endif
#endif
#if TEK
		case ADM:
		case SEL:
		case TEK4014:
			tplot(mode, x, y);
			break;
#endif
#if VT125
		case VT:
			vtplot(mode, x, y);
			break;
#endif
		}
#endif

Now, we all love nested preprocessor if-statements inside switch blocks, but let’s look aside from that and at what’s supported here.

There’s the NRNOC_X11, which I believe introduces the only part of this block that might actually be executed nowadays. In addition we have SUNCORE, which might be Solaris, but I’d bet this supports something that existed long before Oracle acquired Sun back in 2010. There’s TEK, which may refer to something like the Tektronix 4010:

Tektronix_4014
This baby was made in 1972, but it still runs NEURON’s plotting functions. Just download the 5 MB zip-file of NEURON’s source code and … oh, wait.

And then there’s NeXTstep, which Apple acquired in 1997 and used to replace its own Mac OS with Mac OS X. Considering that NeXTstep made its last official release in 1995, I think its fair to say that this piece of code was written in the previous century. It could, of course, just be that no one has deared to search-replace NeXTstep with OS X, but I doubt it.

I should finish this with a final caveat: It could be that this code isn’t at all in use by NEURON anymore. After all, I found all of the above in the plot.c file in the src/oc folder of NEURON’s source code. The above could be remainders of NEURON’s interpreter language, hoc, which was originally made as a demonstration of how to use the parser generator Yacc in the The Unix Programming Environment book. As far as I know, NEURON is the only software out there that’s still using hoc, but that’s a story for another day.

Image Credits: Wikipedia User:ClickRick, Wikipedia User:Rees11.

New project structure for projects in Qt Creator with unit tests

Note: This is a new version of an earlier post, with a revised project structure.

Note 2: See this post for the same project structure using the even better Catch testing framework.

This post assumes that you are using a C++ testing framework such as UnitTest++. See this earlier post for some information on installing UnitTest++ on Ubuntu.

To get the most out UnitTest++ it is a good idea to integrate its output into the Qt Creator IDE. The way I have set this up in Qt Creator is with subprojects. One for the main project, which again is split into the app itself and a library, and one for the tests. In addition, I have a helper project file, named defaults.pri. The structure of the project is like this:

MyProject
├─ MyProject.pro
├─ defaults.pri
├─ app/
│  ├─ app.pro
│  └─ main.cpp
├─ src/
│  ├─ src.pro
│  └─ myclass.cpp
└─ tests/
   ├─ tests.pro
   └─ main.cpp

An example project using this code structure has been posted on Github by Filip Sund. (Thanks to Filip for doing this!)

The main project file, MyProject.pro will now be based on a subdirs template, and may look like this:

TEMPLATE = subdirs
CONFIG+=ordered
SUBDIRS = \
    src \
    app \
    tests
app.depends = src
tests.depends = src

The app.depends and tests.depends statements makes sure that the src project is compiled before the application and tests, because the src directory contains the library that will be used by both the app and the tests.

(Thanks to Will for noting that this works better for parallel builds with make -j 8 than my previous version only using CONFIG+=ordered. We should keep CONFIG+=ordered in there still, though, because depends doesn’t affect the order when using make install).

defaults.pri

Each of the other .pro files will include defaults.pri to have all the headers available. defaults.pri contains the following:

INCLUDEPATH += $$PWD/src
SRC_DIR = $$PWD

If the library, main program and tests use common libraries, it is very useful to have the defaults.pri define these dependencies too.

./src

In the src folder, I have myclass.cpp, which is the class that I want to use and test. The src.pro needs to compile to a library, so that it may be used both by app and tests, and could look something like this:

include(../defaults.pri)
CONFIG -= qt

TARGET = myapp
TEMPLATE = lib

SOURCES += myclass.cpp
HEADERS += myclass.h

What this class does is not so interesting, it could be anything. A simple example could be this header file:

#ifndef MYCLASS_H
#define MYCLASS_H

class MyClass {
public:
    double addition(double a, double b);
};

#endif // MYCLASS_H

With this accompaning source file:

#include "myclass.h"

double MyClass::addition(double a, double b) {
    return a * b;
}

./app

I only have a main.cpp file in app, because the app is basically just something that uses everything in the src folder. It will depend on the shared compiled library from src, and app.pro would look something like this:

include(../defaults.pri)

TEMPLATE = app
CONFIG += console
CONFIG -= app_bundle
CONFIG -= qt

SOURCES += main.cpp

LIBS += -L../src -lmyapp

The main.cpp file could be a simple program that uses MyClass:

#include <myclass.h>
#include <iostream>

using namespace std;

int main()
{
    MyClass adder;
    cout << adder.addition(10, 20) << endl;
    return 0;
}

./tests

In the tests folder I have simply added a main.cpp which will run the tests. Then tests.pro has the following contents:

include(../defaults.pri)
TEMPLATE = app

CONFIG   += console
CONFIG   -= app_bundle
CONFIG   -= qt

SOURCES += main.cpp

LIBS += -lunittest++ -L../src -lmyapp

Which now links to the myapp library in addition to the unit tests.

The main.cpp in tests file which could contain the following, if we were to use UnitTest++ as our testing library:

#include <unittest++/UnitTest++.h>
#include <myclass.h>

TEST(MyMath) {
    MyClass my;
    CHECK(my.addition(3,4) == 7);
}

int main()
{
    return UnitTest::RunAllTests();
}

This test will fail because my implementation of MyClass::addition is completely wrong:

class MyClass {
public:
    double addition(double a, double b) {
        return a * b;
    }
};

Note that I’m including MyClass by through <myclass.h> which is possible because of the INCLUDEPATH variable in defaults.pri.

This is hopefully all you’ll need to define a project that compiles a library, as well as tests and an application using the library.

Monitoring your unit tests without lifting a finger

I love unit testing. First of all, I think it is a good idea to test separate units of the code, but after doing so for some time, I’ve come to realize that unit tests are great for managing the software development cycle too. It all boils down to the idea that you should write tests before you write your code.

Now, this is something that I and others apparently struggle a lot with. How do you write a test for some code that doesn’t even exist yet? Even worse, how do you write a test for a piece of software that you’re not yet sure how will be used?

In computational physics, this problem arises often because we are writing code at the same time as we are trying to understand the physics, mathematics and algorithms at hand. And this is a good thing. You might want to think that one should structure all code before it is written, but this is generally a bad approach in computational physics. Especially if you’re working on something new. The reason is that you will often understand the problem and algorithms better while developing, rather than just reading about them and trying to analyze them blindly.

Keeping the tests and code healthy

But enough with the talk, let’s just assume that you are convinced that you should (or have to) implement some unit tests. At one point you are likely to be in a position where you find it tiresome to have to go into that folder where the tests are defined and run them manually. This is where Jenkins comes in to play.

Continue reading Monitoring your unit tests without lifting a finger

Speeding up compilation on Ubuntu with Qt Creator

Are you reading random stuff on the web while waiting for your C++ compilation to finish? Then you have come to the right place. In this post I will tell you about two really nice tweaks you may do to speed up your compilations, namely ccache and the make -j flag, and how you may set these up in Qt Creator.

ccache

ccache is a clever tool that wraps you compiler (g++ or mpicxx) and understands whether the file you are compiling has been compiled before with exactly the same contents and settings. If it has, it just returns a cached result.

Unlike regular make, ccache is extremely good at detecting the true state of what you are compiling. I have never had any trouble with ccache.

This really speeds up the compilation when you are using make clean, especially if you are switching git branches. In other words, it is a much simpler solution to achieve fast compilation with git branches than to create separate build folders for each branch.

To enable ccache, install it with

sudo apt-get install ccache

and add the following to your .pro file:

QMAKE_CXX = ccache g++

Replace g++ with mpicxx if you are using MPI.

The make -j flag

I realized when compiling the Qt source that make has a -j flag that enables threaded compilation on all available processors on the machine. This also speeds up compilation significantly, and I made a 3.55x performance gain on a 4 core CPU. To enable this flag, go to the Projects view in Qt Creator and add the following arguments to the make build step:

-j

This should look something like this afterwards:

This is how your project settings should look like after adding the -j option to make.
This is how your project settings should look like after adding the -j option to make.

If you prefer not to use all available processors for compilation, you may add a number after -j to set the number of processors. For instance make -j 3 would compile with 3 processors.

Fixing “undefined reference to `vtable for …”

These annoying errors have been haunting me the last couple of days, so I figured I should share the most common reason for their occurrence. That is in my projects at least.

This error is caused because the linker in gcc is unable to find the functions you have defined in your headers in your actual code. So if you have a header which looks like this:

#ifndef MESH_H
#define MESH_H

class Mesh
{
public:
    Mesh();
    ~Mesh();
    virtual void draw();
};

#endif // MESH_H

You must at least have these functions defined in your .cpp file:

#include "mesh.h"

Mesh::Mesh() {
}

Mesh::~Mesh() {
}

void Mesh::draw() {
}

After this, make sure you clean your compile environment to make sure no object files are being misinterpreted by the compiler. If you are using Qt or a project with a Makefile, you could just run these three commands (the first only applies to Qt projects).

qmake
make clean
make

Should you still have trouble, make sure that qmake is actually generating your .moc files for any objects that need them. Sometimes it might even be necessary to empty the build directory completely yourself to make sure there are no files left behind that are not cleaned by make clean.