Python: numpy: concatenation of named arrays

Question

Python: numpy: concatenation of named arrays

Consider the following simple example:

x = numpy.array([(1,2),(3,4)],dtype=[('a','<f4'),('b','<f4')]) y = numpy.array([(1,2),(3,4)],dtype=[('c','<f4'),('d','<f4')]) numpy.hstack((x,y))

You will get the following error:

 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python33\lib\site-packages\numpy\core\shape_base.py", line 226, in vstack return _nx.concatenate(list(map(atleast_2d,tup)),0) TypeError: invalid type promotion

If the array has no headers, it works

 x = numpy.array([(1,2),(3,4)],dtype='<f4') y = numpy.array([(1,2),(3,4)],dtype='<f4') numpy.hstack((x,y))

If I remove the names from x and y, this also works.

Question: how to concatenate, vstack or hstack a named numpy array? Note: numpy.lib.recfunctions.stack_arrays does not work well.

+8

python numpy

Hanan shteingart Sep 2 '13 at 13:10

source share

1 answer

senderle · Answer 1 · 2013-09-02T13:26:40+0000

The problem is that the types are different. "Title" is part of the type, and y uses different names from x , so the types are incompatible. If you use compatible types, everything works fine:

 >>> x = numpy.array([(1, 2), (3, 4)], dtype=[('a', '<f4'), ('b', '<f4')]) >>> y = numpy.array([(5, 6), (7, 8)], dtype=[('a', '<f4'), ('b', '<f4')]) >>> numpy.vstack((x, y)) array([[(1.0, 2.0), (3.0, 4.0)], [(5.0, 6.0), (7.0, 8.0)]], dtype=[('a', '<f4'), ('b', '<f4')]) >>> numpy.hstack((x, y)) array([(1.0, 2.0), (3.0, 4.0), (5.0, 6.0), (7.0, 8.0)], dtype=[('a', '<f4'), ('b', '<f4')]) >>> numpy.dstack((x, y)) array([[[(1.0, 2.0), (5.0, 6.0)], [(3.0, 4.0), (7.0, 8.0)]]], dtype=[('a', '<f4'), ('b', '<f4')])

Sometimes dstack etc. smart enough to use methods wisely, but numpy has no way of knowing how to combine record arrays with different user-defined field names.

If you want to combine data types, you need to create a new data type. Make no mistake thinking that the sequence of names ( x['a'] , x['b'] ...) constitutes the true dimension of the array; x and y above is the 1st array of memory blocks, each of which contains two 32-bit floats, which can be accessed using the names 'a' and 'b' . But, as you can see, if you access a single element in an array, you will not get another array, as if it were really the second dimension. Here you can see the difference:

 >>> x = numpy.array([(1, 2), (3, 4)], dtype=[('a', '<f4'), ('b', '<f4')]) >>> x[0] (1.0, 2.0) >>> type(x[0]) <type 'numpy.void'> >>> z = numpy.array([(1, 2), (3, 4)]) >>> z[0] array([1, 2]) >>> type(z[0]) <type 'numpy.ndarray'>

This allows recording arrays to contain heterogeneous data; records can contain both strings and ints, but the trade-off is that you do not get the full power of ndarray at the individual record level.

The result is that to combine the individual blocks of memory you really need to change the dtype array. There are several ways to do this, but the simplest thing I could find includes the little-known library numpy.lib.recfunctions (which, as I see, you already found!):

 >>> numpy.lib.recfunctions.rec_append_fields(x, y.dtype.names, [y[n] for n in y.dtype.names]) rec.array([(1.0, 2.0, 1.0, 2.0), (3.0, 4.0, 3.0, 4.0)], dtype=[('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<f4')])

Python: numpy: concatenation of named arrays

More articles: