Pass file descriptor to cython function

I want to compile a python function with cython, to read a binary file that skips some writes (without reading the whole file and then slicing, since I would run out of memory). I can come up with something like this:

def FromFileSkip(fid, count=1, skip=0): if skip>=0: data = numpy.zeros(count) k = 0 while k<count: try: data[k] = numpy.fromfile(fid, count=1, dtype=dtype) fid.seek(skip, 1) k +=1 except ValueError: data = data[:k] break return data 

and then I can use a function like this:

  f = open(filename) data = FromFileSkip(f,... 

However, to compile the FromFileSkip function with cython, I would like to define all the types involved in the function, so fid is also a file handler. How can I determine its type in a cython, since it is not a "standard" type, for example. integer. Thanks.

+4
source share
1 answer

Defining a fid type will not help, because calling python functions is still expensive. Try compiling your example with the -a flag to see what I mean. However, you can use low-level C functions to process files to avoid python overhead in your loop. For example, I assumed that the data starts at the very beginning of the file and that its type is double

 from libc.stdio cimport * cdef extern from "stdio.h": FILE *fdopen(int, const char *) import numpy as np cimport numpy as np DTYPE = np.double # or whatever your type is ctypedef np.double_t DTYPE_t # or whatever your type is def FromFileSkip(fid, int count=1, int skip=0): cdef int k cdef FILE* cfile cdef np.ndarray[DTYPE_t, ndim=1] data cdef DTYPE_t* data_ptr cfile = fdopen(fid.fileno(), 'rb') # attach the stream data = np.zeros(count).astype(DTYPE) data_ptr = <DTYPE_t*>data.data # maybe skip some header bytes here # ... for k in range(count): if fread(<void*>(data_ptr + k), sizeof(DTYPE_t), 1, cfile) < 0: break if fseek(cfile, skip, SEEK_CUR): break return data 

Note that the output of cython -a example.pyx does not show python overhead inside the loop.

+5
source

All Articles