Speeding up loading OpenEXR files in Python with Rust and PyO3
We have lately had to read many large OpenEXR image files in Python at work. Unfortunately, the Python package OpenEXR loads these files quite slowly. A single one of our files can easily take more than a minute to open. Other software, such as the image viewer tev, loads the same files in a few seconds.
I therefore started looking into alternative ways to open these files. There are a few other Python packages out there, but those either failed to open the files, loaded just as slow, or required building and linking a C++ library manually to work. And after doing said building and linking, I of course started hitting segfaults.
I then noticed that the exr crate in Rust loaded the same files really fast.
A file that took over a minute to load in OpenEXR
takes only a few seconds
to load using this crate.
The speed seems to be due to exr
using multithreading to load the files
and otherwise being well-written and performant.
So I decided to look into how you can expose Rust code to Python these days.
It turns out that it is incredibly easy thanks to the PyO3 project
which provides such bindings.
Further, they also develop the numpy
Rust crate that makes it possible to pass
NumPy arrays to and from Rust.
On top of it all, there is even a great little tool called maturin that
allows you to easily develop, build and even publish your Rust-based Python
packages to PyPI, which is the place where Pip fetches packages from.
Not only that, but it even supports cross-compilation from Linux to Windows,
as long as you have added a target such as x86_64-pc-windows-msvc
with rustup
.
Publishing a cross-compiled package to PyPI is as easy as running something along the lines of:
maturin publish --target x86_64-pc-windows-msvc --interpreter python3.10
In just a few hours, I was able to wrap the functionality we needed from
the exr
crate and make a Python package based on it that is now on PyPI.
Naming is as always the hardest part, but I landed on “pyroexr” which is
some kind of play on Python, Rust and OpenEXR.
You can install it using PIP:
python -m pip install pyroexr
And open an EXR file using a script such as this to list the channels:
import pyroexr
import matplotlib.pyplot as plt
image = pyroexr.load("Ocean.exr")
print("Channels", image.channels())
Channels ['B', 'G', 'R']
And view one of the channels in a plot with imshow:
plt.imshow(image.channel("B"))
plt.show()
Note that pyroexr
is minimal and only supports the functionality we currently
need ourselves.
For instance, the package assumes you want to load the entire file into memory
and that there is only one layer in the file.
I have no current plans to extend it further, but contributions are of course welcome.
Implementation
Below, I have listed the main steps I had to perform to expose the functionality we needed from Rust to Python. You can find the full source code for the package on GitHub.
First of all, we expose the module using the #[pymodule]
annotation:
#[pymodule]
fn pyroexr(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(load, m)?)?;
Ok(())
}
The load
function mentioned above is annotated using #[pyfunction]
and is used to load a file and return an image with everything present:
#[pyfunction]
fn load(filename: &str) -> PyResult<ImageWrapper> {
let image = match exr::prelude::read::read()
.no_deep_data()
.largest_resolution_level()
.all_channels()
.all_layers()
.all_attributes()
.from_file(filename)
{
Ok(img) => img,
Err(err) => {
return Err(PyRuntimeError::new_err(format!(
"Could not load file '{filename}' due to error: '{err}'"
)));
}
};
Ok(ImageWrapper { image })
}
The ImageWrapper
is a simple class that wraps the underlying exr::Image
:
#[pyclass]
struct ImageWrapper {
image: Image<SmallVec<[Layer<AnyChannels<FlatSamples>>; 2]>>,
}
Thanks to the numpy
crate, we can then expose a function that reads out the
data from a specific channel as a NumPy array with float data:
#[pymethods]
impl ImageWrapper {
// [...]
fn channel<'a>(&self, py: Python<'a>, name: &str) -> PyResult<&'a PyArray2<f32>> {
let layer = match self.image.layer_data.first() {
Some(l) => l,
None => {
return Err(PyRuntimeError::new_err("Image contains no layers".to_string()));
}
};
let channel = match layer
.channel_data
.list
.iter()
.find(|channel| channel.name.eq(name))
{
Some(c) => c,
None => {
return Err(PyKeyError::new_err(format!(
"Channel '{name}' not found in image"
)));
}
};
let size = [layer.size.1, layer.size.0];
let array = PyArray::from_iter(py, channel.sample_data.values_as_f32()).reshape(size);
array
}
}
Then the code is compiled using maturin develop
and can be tested in Python as shown above.
I am really impressed with how straightforward all of this was and how quickly I could get our files to load faster in Python. Kudos to the PyO3 and exr developers!