Choosing the right license for your code

I was pointed to John Hunter’s Why we should be using BSD and came to think about how I rather tend to advocate using the GPL license. Richard Stallman makes some very convincing arguments for why GPL is the better choice, but even though I like this reasoning, I do see why choosing a different license may be the right choice.

In this post, I will explain some of the differences between BSD and GPL and hopefully help you to choose for yourself. The target audience for this post are people in the scientific community, but it may also be useful to others.

So how do BSD and GPL differ? Below I have listed what I think is most important, but please remember that I am not a lawyer, and that there are other effects of licensing that may apply to you:

  • BSD is a free-for-all license that only requires attribution. Anyone using your code must give credit to you as the original author. Apart from that, they can do pretty much everything. They can change the code and sell programs based on it without asking you.
  • GPL too requires attribution. However, it also requires anyone distributing software based on your source code to make available any changes they make to it. They can still sell programs based on your code to others without asking, but if they were to do something really smart with your code, you can demand that they share the changes with you. And you too can sell software based on their changes.

Before moving on, I would like to clarify one important thing: Even though you are licensing your code as BSD or GPL, you can still sell software based on your code. You can even sell the rights to use your code with a different license. The licenses only apply to what other people can do with your code. You still keep all other rights.

In the below table, I have summarized the differences between the licenses by listing a few scenarios. “They” is referring to someone using your code in their own software projects:

BSD GPL
You can sell software based on your own code. ✔   ✔  
They can sell software based on your code. ✔   ✔  
They can make changes to your code. ✔  
They have to give you credit for writing the original code. ✔  
They have to use the same license as you.
You can demand access to the changes they made to your code.
They can sell software based on your code without sharing their changes with you.
You can sell software based on the changes they made to your code. *

*Except if they also use a BSD license for their changes.

Some people call GPL “viral” because it “infects” any software using GPL licensed code. Any project using code with a GPL license will basically have to be GPL licensed too. This may be a showstopper for some companies that want to keep their own changes proprietary (secret/hidden) while still using your code. Note that this only applies if they distribute the software to others: If someone downloads your code and plays around with it on their own computer, they don’t have to care about the GPL license.

The viral effect is why Richard Stallman advocates the GPL license. He thinks that it benefits the open source community (and perhaps also the scientific community) that software using GPL code must be GPL too. In his opinion, companies have money to spend on developing new products or to buy access to other people’s source code. The open source community, on the other hand, does not necessarily have the same monetary resources, but has access to all the open source code out there. If an open source project is fine with the adhering to the GPL license, they can also use all other GPL licensed code .

Further, I think you should choose GPL for any code you think might be commercially viable for yourself in the future. Some people think the best option is to keep source code proprietary if you want to make money of it in the future. I think the exact opposite. If you are working on a project that you believe has some commercial value, why not let others find that commercial value for you? If someone starts selling software based on an improved version of your code, you can demand that they share the changes with you and start selling the software yourself.

However, there are many good arguments for using the BSD license too. Many open source projects are working with companies that require code not to be GPL licensed. One such project is Matplotlib, for which John Hunter is the original author. They have received contributions from companies like Enthought, which want to make sure they don’t have to license all their software as GPL because some code they use is from GPL projects. Such contributions are often significant for open source projects, and I can see why choosing BSD is a good middle way for them to keep the project open while still getting as much benefit from cooperation with companies.

Some also argue that BSD attracts more developers than GPL because some developers may be put off by a GPL license. However, I believe this works both ways. Some developers are also put off by non-GPL projects. I don’t think your licensing choice will change much in the number of developers that are attracted to your project. If anything, I would do some research and contact relevant companies and developers and ask if they would be willing to contribute to your project. You can always change the license as you go, but you should know that all code that has already been licensed as GPL or BSD will forever be licensed that way. It is non-retractable. Any future changes you make to your own code, however, can be put under a different license. You may even choose not to license your new code any more if you want to.

To conclude, when choosing between BSD and GPL, you should consider whether your main goal is to contribute to the open source community or to everyone in general. If you target only the open source community, GPL may be your best choice. If you want to contribute to everyone without further restrictions to the usage of your code, BSD may be your best bet.

Personally, I will continue using GPL as long as I think my code may turn into a commercially viable project or if I just don’t have a long-term plan yet. However, I will happily use BSD if I think the code can be contributed into a BSD licensed project such as Matplotlib.

There are also many other licenses out there. Have a look at the list made by the Open Source Initiative if you want to know more about other licenses.

Writing molecular dynamics data to binary LAMMPS format

In this post I will explain  how to write to the binary LAMMPS file format from C++, using data stored in Armadillo vectors and matrices. After running the example in this post you should be able to open the resulting file in Ovito or any other program capable of reading binary LAMMPS files. The example should also be fairly easy to port to other data structure type, if needed.

For the impatient: You’ll find a working main.cpp file and a qmake project file on GitHub.

The result, if rendered in Ovito, is two silicon atoms (in red) and one oxygen atom:

lammps-in-ovito

About the LAMMPS format

LAMMPS is a molecular dynamics simulation package that is extremely versatile with plenty of interaction potentials and features implemented. However, you may have written some other code involving atoms and found yourself in the position of considering using a standard file format to write atom data to file. In this case, the XYZ-format has likely passed your mind, but because this is a ASCII-based text format, it is slow to read and write. This format also lacks standardized headers for information such as the system boundaries – causing visualizers like Ovito to have to guess for the right boundaries in your system.

Continue reading Writing molecular dynamics data to binary LAMMPS format

Monitoring your unit tests without lifting a finger

I love unit testing. First of all, I think it is a good idea to test separate units of the code, but after doing so for some time, I’ve come to realize that unit tests are great for managing the software development cycle too. It all boils down to the idea that you should write tests before you write your code.

Now, this is something that I and others apparently struggle a lot with. How do you write a test for some code that doesn’t even exist yet? Even worse, how do you write a test for a piece of software that you’re not yet sure how will be used?

In computational physics, this problem arises often because we are writing code at the same time as we are trying to understand the physics, mathematics and algorithms at hand. And this is a good thing. You might want to think that one should structure all code before it is written, but this is generally a bad approach in computational physics. Especially if you’re working on something new. The reason is that you will often understand the problem and algorithms better while developing, rather than just reading about them and trying to analyze them blindly.

Keeping the tests and code healthy

But enough with the talk, let’s just assume that you are convinced that you should (or have to) implement some unit tests. At one point you are likely to be in a position where you find it tiresome to have to go into that folder where the tests are defined and run them manually. This is where Jenkins comes in to play.

Continue reading Monitoring your unit tests without lifting a finger

Speeding up compilation on Ubuntu with Qt Creator

Are you reading random stuff on the web while waiting for your C++ compilation to finish? Then you have come to the right place. In this post I will tell you about two really nice tweaks you may do to speed up your compilations, namely ccache and the make -j flag, and how you may set these up in Qt Creator.

ccache

ccache is a clever tool that wraps you compiler (g++ or mpicxx) and understands whether the file you are compiling has been compiled before with exactly the same contents and settings. If it has, it just returns a cached result.

Unlike regular make, ccache is extremely good at detecting the true state of what you are compiling. I have never had any trouble with ccache.

This really speeds up the compilation when you are using make clean, especially if you are switching git branches. In other words, it is a much simpler solution to achieve fast compilation with git branches than to create separate build folders for each branch.

To enable ccache, install it with

sudo apt-get install ccache

and add the following to your .pro file:

QMAKE_CXX = ccache g++

Replace g++ with mpicxx if you are using MPI.

The make -j flag

I realized when compiling the Qt source that make has a -j flag that enables threaded compilation on all available processors on the machine. This also speeds up compilation significantly, and I made a 3.55x performance gain on a 4 core CPU. To enable this flag, go to the Projects view in Qt Creator and add the following arguments to the make build step:

-j

This should look something like this afterwards:

This is how your project settings should look like after adding the -j option to make.
This is how your project settings should look like after adding the -j option to make.

If you prefer not to use all available processors for compilation, you may add a number after -j to set the number of processors. For instance make -j 3 would compile with 3 processors.

Setting up UnitTest++ with Qt Creator in a nice project structure

Note: I’ve found a better way to visually verify that all tests are running. Check out this post on Jenkins to see how I’m now working with my tests. The below post is still useful as a reference on how to set up UnitTest++ in Qt Creator, also when using Jenkins.

Note 2: A new and, in my opinion, better project structure is shown in this new post.

Note 3: See this post for the same project structure using the even better Catch testing framework.

When you want to make sure that your code is working properly, it is a good idea to divide it into smaller, independent pieces that may be tested individually. A unit test is a short  code that tests a smallest possible portion of your application. It is a good idea to write tests as you go and it can even be useful to write a test before you even implement the function that will be tested.

With the combination of UnitTest++ and Qt Creator, I’m now able to write unit tests while working on the code and also have some nice visual indication about failed tests during the build step of my project:

qt-creator-unit-tests1

During my search for a good setup for testing my applications I realized there were a few needs that I wanted to satisfy:

  1. Creating new unit tests should be dead-easy to do. I want to spend as little time as possible on reading documentation about the testing framework.
  2. Tests should be run immediately after a new build of the source code, automatically and without the extra hassle to remember to run the tests.
  3. Tests should provide visual feedback with an easy way to get to where the tests fail. This should show up in the IDE or some other useful GUI tool.
  4. The testing framework should be fairly easy to install, especially on Ubuntu. This is because I want to be able to promote it to my fellow students.

Continue reading Setting up UnitTest++ with Qt Creator in a nice project structure

Working with percolation clusters in Python

We’re working on a new project in FYS4460 about percolation. In the introduction of this project, we are given a few commands to help us demonstrate a few properties of percolation clusters using MATLAB.

As the Python-fan I am, I of course had to see if I could find equivalent commands in Python, and thankfully that was quite easy. Below I will summarize the commands that will generate a random matrix of filled and unfilled areas, label each cluster in this matrix and calculate the area of each such cluster. Finally, we’ll draw a bounding box around the largest cluster.

Continue reading Working with percolation clusters in Python

Optimizing your C++ code for molecular dynamics

While working with the molecular dynamics project in FYS4460 I decided to learn more about how to optimize my C++ code for performance. As always, I follow Donald Knuth’s famous quote as a guideline to optimization:

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil”[2]

And this has proved to be as true as ever in my efforts to optimize my code. There are a bunch of things that I have tried that didn’t turn out to be as effective as I had thought, and some other that I would never think could be so important. I’ve listed most of these in this post so you too may learn from my experience. They are all listed in the order from most useful to most wasteful:

Continue reading Optimizing your C++ code for molecular dynamics