Best practices for reading and working with fortran ordered numpy arrays

I read ascii and binaries that specify 3-dimensional arrays in fortran order. I want to perform some manipulations with these arrays and then export them to the same ascii or binary format.

I am confused about the best ways to work with these arrays in my library. My current design seems to be error prone, because I have to keep redoing things from the default C order if a new array is created.

Current construction:

I have several functions that read these files and return numpy arrays. The read functions all behave similarly and essentially read the data and return something like:

return array.reshape((i, j, k), order='F')

As I understand it, I am returning the view for fortran order to the original array.

My code assumes all arrays are in fortran order. This means that any new operations that can create a new array, I will definitely use reshape to convert it to fortran mode.

This seems very error prone, because I have to pay close attention to any operation that creates a new array, and be sure to reformat it to fortran order, since the default order is usually C.

Later, I may have to export these arrays to binary or ascii again, and you will need to maintain the order of fortran. So, I use numpy.nditer to write each item in fortran order.

Problems:

  • The current approach seems very error-prone, as I usually think in the order of C. I am afraid that I will always be bitten by the missed reshape calls that force things in the order of C.

    • I would not worry about ordering the elements of the array, except when reading the input files or writing data to the output files.
  • The current approach seems messy, because indexes can be interpreted differently and things can get confused.

    • When working with fortran arrays, tuple ordering for indexes happens backward, right?
    • So, x[(1, 2, 3)] for the fortran array means k = 1, j = 2 and i = 3, while x[(1, 2, 3)] for the array of order C means k = 3, j = 2, i = 1 right?
    • This means that I and my library users should always think about indexes in the order of (k, j, i), and not that we C / Python programmers tend to think (i, j, k).

Question:

Is there any best practice for this? In an ideal world that I would like to read in the fortran requested by arrays, then forget about ordering until I export the file. However, I'm afraid I will continue to misinterpret indexes, etc.

I read the only documentation on it that can find, http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues . However, the concept still seems clear as dirt to me. Maybe I just need another explanation of the numpy document, http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues .

+8
python numpy
source share
1 answer

Numpy abstracts the difference between Fortran ordering and python-level C ordering. (In fact, you can even have different orders for> 2d arrays with numpy. They are all treated the same at the python level.)

The only time you need to worry about ordering C vs F is when you read / write to disk or pass an array to lower-level functions.

Simple example

As an example, let's create a simple three-dimensional array in both order and Fortran order:

 In [1]: import numpy as np In [2]: c_order = np.arange(27).reshape(3,3,3) In [3]: f_order = c_order.copy(order='F') In [4]: c_order Out[4]: array([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8]], [[ 9, 10, 11], [12, 13, 14], [15, 16, 17]], [[18, 19, 20], [21, 22, 23], [24, 25, 26]]]) In [5]: f_order Out[5]: array([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8]], [[ 9, 10, 11], [12, 13, 14], [15, 16, 17]], [[18, 19, 20], [21, 22, 23], [24, 25, 26]]]) 

Note that both of them look the same (they are at the level with which we interact with them). How can you say that they are in different orders? First of all, let's take a look at the flags (pay attention to C_CONTIGUOUS vs F_CONTIGUOUS ):

 In [6]: c_order.flags Out[6]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [7]: f_order.flags Out[7]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False 

And if you don't trust the flags, you can efficiently view the memory order by looking at arr.ravel(order='K') . order='K' important. Otherwise, when you call arr.ravel() , the output will be in C-order, regardless of the layout layout of the array. order='K' uses the memory layout.

 In [8]: c_order.ravel(order='K') Out[8]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]) In [9]: f_order.ravel(order='K') Out[9]: array([ 0, 9, 18, 3, 12, 21, 6, 15, 24, 1, 10, 19, 4, 13, 22, 7, 16, 25, 2, 11, 20, 5, 14, 23, 8, 17, 26]) 

The difference is actually represented (and saved) in the strides array. Note that c_order strides (72, 24, 8) , and f_order are steps (8, 24, 72) .

To prove that indexing works the same way:

 In [10]: c_order[0,1,2] Out[10]: 5 In [11]: f_order[0,1,2] Out[11]: 5 

Read and write

The main place you run into problems is when you read or write to a disc. Many file formats are awaiting a specific order. I assume that you are working with seismic data formats, and most of them (e.g. Geoprobe.vol, and I think the Petrel volume format) essentially writes a binary header, and then a Fortran-enabled 3D array to disk.

With this in mind, I use an example of a small seismic cube (a fragment of some data from my dissertation).

Both of them are uint8 binary arrays with the form 50x100x198. One is in C-order, and the other is in Fortran-order. c_order.dat f_order.dat

To read them in:

 import numpy as np shape = (50, 100, 198) c_order = np.fromfile('c_order.dat', dtype=np.uint8).reshape(shape) f_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape(shape, order='F') assert np.all(c_order == f_order) 

Note that the only difference is that the memory layout matches reshape . The memory layout of the two arrays is still different (reformatting does not create a copy), but they are processed the same at the python level.

Just to prove that the files are actually written in a different order:

 In [1]: np.fromfile('c_order.dat', dtype=np.uint8)[:10] Out[1]: array([132, 142, 107, 204, 37, 37, 217, 37, 82, 60], dtype=uint8) In [2]: np.fromfile('f_order.dat', dtype=np.uint8)[:10] Out[2]: array([132, 129, 140, 138, 110, 88, 110, 124, 142, 139], dtype=uint8) 

Imagine the result:

 def plot(data): fig, axes = plt.subplots(ncols=3) for i, ax in enumerate(axes): slices = [slice(None), slice(None), slice(None)] slices[i] = data.shape[i] // 2 ax.imshow(data[tuple(slices)].T, cmap='gray_r') return fig plot(c_order).suptitle('C-ordered array') plot(f_order).suptitle('F-ordered array') plt.show() 

enter image description hereenter image description here

Please note that we indexed them the same way and they are displayed the same way.

Common errors with IO

First of all, try reading in a file ordered by Fortran as if it were C-ordered, and then look at the result (using the plot function described above):

 wrong_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape(shape) plot(wrong_order) 

enter image description here

Not so good!

You mentioned that you need to use "reverse" signs. This is probably because you recorded what happened in the figure above by doing something like (note the reverse form!):

 c_order = np.fromfile('c_order.dat', dtype=np.uint8).reshape([50,100,198]) rev_f_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape([198,100,50]) 

Imagine what happens:

 plot(c_order).suptitle('C-ordered array') plot(rev_f_order).suptitle('Incorrectly read Fortran-ordered array') 

enter image description hereenter image description here

Please note that the image in the right corner (time-list) of the first graph corresponds to the transposed version of the image in the lower left corner.

Similarly, print rev_f_order[1,2,3] and print c_order[3,2,1] give 140 , and indexing them equally gives a different result.

Basically, this is where your reverse indexes occur. Numpy thinks this is a C-ordered array with a different shape. Note that if we look at the flags, they are both C-adjacent in memory:

 In [24]: rev_f_order.flags Out[24]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [25]: c_order.flags Out[25]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False 

This is because a fortran-ordered array is equivalent to a reverse-form C-ordered array.

Burn to Fortran-Order

It adds extra wrinkle when writing a numpy array to disk in Fortran order.

Unless you specify otherwise, the array will be written in C-order, regardless of its memory layout! (There is a clear note about this in the documentation for ndarray.tofile , but this is the normal version. The opposite behavior would be wrong, though, imo)

Therefore, regardless of the memory location of the array, in order to write it to disk in Fortran order, you need to do:

 arr.ravel(order='F').tofile('output.dat') 

If you write it as ascii, then the same applies. Use ravel(order='F') and then write out a 1-dimensional result.

+12
source share

All Articles