I still have not found a completely satisfactory solution, but, nevertheless, you can do something to get a pointer with much less overhead in CPython. Firstly, the reason why both of the ways mentioned above are so slow is because both .ctypes and .__array_interface__ are on-demand attributes that are set by array_ctypes_get() and array_interface_get() in numpy/numpy/core/src/multiarray/getset.c . The first imports ctypes and creates an instance of numpy.core._internal._ctypes , and the second creates a new dictionary and populates it with a lot of unnecessary things in addition to the data pointer.
At the Python level, nothing can be done on this invoice, but at the C-level, you can write a micromodule that bypasses most of the service data:
#include <Python.h> #include <numpy/arrayobject.h> PyObject *_get_ptr(PyObject *self, PyObject *obj) { return PyLong_FromVoidPtr(PyArray_DATA(obj)); } static PyMethodDef methods[] = { {"_get_ptr", _get_ptr, METH_O, "Wrapper to PyArray_DATA()"}, {NULL, NULL, 0, NULL} }; PyMODINIT_FUNC initaccel(void) { Py_InitModule("accel", methods); }
Compile the extension in setup.py as usual and import as
try: from accel import _get_ptr def get_ptr(x): return C.cast(_get_ptr(x), p_t) except ImportError: get_ptr = get_ptr_array
PyPy from accel import _get_ptr will crash and get_ptr will return to get_ptr_array , which works with Numpypy.
In terms of performance, for lightweight C function calls, ctypes + accel._get_ptr() is still quite slower than the native CPython extension, which essentially has no overhead. This, of course, is much faster than get_ptr_ctypes() and get_ptr_array() above, so the overhead may not be significant for middle-weight C function calls.
One of them has become compatible with PyPy, although I have to say that having spent quite a bit of time evaluating PyPy for my scientific computing applications, I do not see the future for this until they (rather stubbornly) refuse to support the full API Interface CPython.
Update
I found that ctypes.cast() became a bottleneck after introducing accel._get_ptr() . You can get rid of the cast by declaring all pointers in the interface as ctypes.c_void_p . This is what I came across:
def get_ptr_ctypes2(x): return x.ctypes._data def get_ptr_array(x): return x.__array_interface__['data'][0] try: from accel import _get_ptr as get_ptr except ImportError: get_ptr = get_ptr_array
Here get_ptr_ctypes2() avoids the cast by directly accessing the hidden attribute ndarray.ctypes._data . Below are some temporary results for calling heavy and lightweight C functions from Python:
heavy C (few calls) light C (many calls) ctypes + get_ptr_ctypes(): 0.71 s 15.40 s ctypes + get_ptr_ctypes2(): 0.68 s 13.30 s ctypes + get_ptr_array(): 0.65 s 11.50 s ctypes + accel._get_ptr(): 0.63 s 9.47 s native CPython: 0.62 s 8.54 s Cython (no decorators): 0.64 s 9.96 s
So, with accel._get_ptr() and no ctypes.cast() s, ctypes speed really competes with the CPython internal extension. So I just need to wait for someone to rewrite h5py , matplotlib and scipy with ctypes to try PyPy for something serious ...