Python is a wonderful “very high level” language with an elegant design. It is an ultimate tool to rapidly develop applications. However, when it comes to performance (speed and memory), Python sucks. It is not meant for performance. So what do you do after building a quick prototype in python if you want it to be lean and mean?
From the beginning, Python’s designers understood this use-case and exposed a “C” API. Using the Python C-API, one can write “modules” in C which can then be imported into Python. This solution is not bad and provided you have enough patience, works great. There is a tool called SWIG which can generate the “glue” code around C code. It automates writing of code using C-API and makes it easier for one to maintain the C “module”. However, since SWIG generates code, when some problem occurs, it is quite painful to debug through the wrapper code. For the lazy developers out there (like me :) ), this can be quite a deterrent.
I’ve been working on a project for using Wikipedia data to assign “topics” to arbitrary pieces of English text. The code written in pure Python takes about 3 minutes to run. When I profiled this code, I found that most of the time was being spent in disk reads (through the db). I decided, after examining the data I had, to load all of it into memory. I tried doing this in pure Python and saw the memory usage creeping up beyond the 4GB capacity when I decided to save my poor machine from thrashing by killing the hog!
On doing some tests, I found that Python consumes about 32 bytes to store an integer! Hmmm…. time to move the data structures to C for more efficient memory usage. I started looking around for tools to interface C code with Python when I came across “ctypes” and immediately fell in love with it.
ctypes lets you load any “shared object” or “dynamic link library” in a Python program and unobtrusively call functions. You can construct C datatypes required by C functions (pointers, longs, ints, chars, arrays, structs) and even specify callback functions to be passed to C code! I will give you a basic introduction to ctypes to get you started with Python - C interfacing.
Code
test.c
#include <stdio.h>
// you can initialize stuff in this function
// it is called when the so is loaded
void _init()
{
printf("Initialization of shared object\n");
}
// you can do final clean-up in this function
// it is called when the so is getting unloaded
void _fini()
{
printf("Clean-up of shared object\n");
}
int add(int a, int b)
{
return(a+b);
}
int sum_values(int *values, int n_values)
{
int i;
int sum = 0;
for (i=0; i<n_values; i++) {
sum += values[i]
}
return sum
}
We have to compile test.c to create the shared object.
gcc -fPIC -c test.c
ld -shared -soname libtest.so.1 -o libtest.so.1.0 -lc test.o
test.py
from ctypes import *
# load the shared object
libtest = cdll.LoadLibrary('./libtest.so.1.0')
# call the function, yes it is as simple as that!
print libtest.add(10, 20)
# call the sum_values() function
# we have to create a c int array for this
array_of_5_ints = c_int * 5
nums = array_of_5_ints()
# fill up array with values
for i in xrange(5): nums[i] = i
# since the function expects an array pointer, we pass is byref (provided by ctypes)
print libtest.sum_values(byref(nums), 5)
python test.py
Output
Initialization of shared object
30
10
Clean-up of shared object
How much simpler can Python - C interfacing become? ctypes is a standard module from Python 2.5.