Writing C extensions for Python with CFFI
Posted in programming
CFFI is a library that allows to easily integrate C extensions in Python packages; in this post I'll outline the main steps that are needed.
I like to always keep my code neat, that is why I use Python instead of R whenever I can, and why I try to wrap all the fundamental logic of whatever I'm doing in a properly built package first, and then use that package to prototype all the ideas I want later. Even though prototyping usually involves messy scripting, this workflow lets me keep the main logic clean, tested and documented.
However under the course of my PhD I have often find myself needing to do some numerical heavy-lifting, and have often found that Python just won't cut it, at least not at the level I would like it to. To be fair, I can't complain about Numpy and the like, and most of the time the problems are the
for loops that my code inevitably has to have. Turns out, Python is really slow. Parallelizing the number-crunching bits is a possibility, but the speedup from that is limited by the number of cores and also, in practice the speedup is rarely perfectly linear; probably the best idea is to simply make those bits run in C or some other low level language if they are not too difficult to code; when it comes to speed, nothing beats a compiled and statically typed language like C, from which we can get a speedup factor in the hundreds. That is why I find it so convenient to add C extensions to my packages.
There are a few well known options to do this, but admittedly the first time I tried it I spent an embarrassingly long time trying to figure out what was the best way to do it. Using
ctypes is easy, but it seems to require a compiled C binary, so it doesn't allow much portability; Cython seems like a thorough alternative, but the learning curve looks quite steep, and in my case it was definitely not worth it because I needed only a relatively shallow integration with C in which I just delegate the number-crunching logic.
A third alternative, and the one that worked best for me, is CFFI. It barely has any learning curve (at least for the basic functionality I needed), as it requires minimal setup and at the same time allows complete portability. I'll try to explain briefly the 3 steps I needed to make it work; for this, assume we have some C source code
my_file.c and that we have installed the
The FFI builder file
This file tells Python which C files to compile when building your package, in which order, and which of their functions should be visible from Python. The structure is something like the following:
from cffi import FFI ffibuilder = FFI() ffibuilder.cdef(""" double my_c_func( double a, double b); """) ffibuilder.set_source("my_c_ext", # name of the output C extension """ #include "path/to/my_file.h" """, sources=['path/to/my_file.c','path/to/my_file_dependency.c'], libraries=['m']) # on Unix, link with the math library if __name__ == "__main__": ffibuilder.compile(verbose=True)
The functions declared in the
ffibuilder.cdef function are those that will be visible to Python code; it should be a subset of the .h file. In the case of the
set_source function arguments, they are:
- The name you want to give to the C extension in Python (more on this later)
- The .h file of your C code relative to the FFI builder file
- The path to the C source file, followed by the source of its dependencies, if any, relative to the project's root
- Other libraries that should be used at compilation time, like the math library in this example
The setup.py file
The only argument you need to add to your
setup.py file is
cffibuilder.py is the file described above. When you build your library, Python will take care of compiling the C files you specified in the builder file with their dependencies and all, and will make the resulting C program visible to your Python code with the name you gave it.
Using your C program from Python
To use the C routines make sure to import your extension as
from my_c_ext import ffi, lib
my_c_ext is the name you passed as the first argument to
ffibuilder.set_source in the FFI builder file. The C functions will then be usable through the
lib object you just imported by simply doing:
you have to make sure the arguments can be converted to C types; this is specially important if they are Numpy arrays. If, say,
x is an array of doubles, then you would have to pass it as
That's it. Using those 3 simple step allowed me to do computations in a matter of hours that would have taken probably more than a week in pure Python, which increased my productivity quite a bit. Hopefully you'll find it useful too.