Saving record arrays in object arrays

I would like to convert a list of record arrays - dtype is (uint32, float32) - to a numpy dtype np.object :

 X = np.array(instances, dtype = np.object) 

where instances is a list of arrays with the data type np.dtype([('f0', '<u4'), ('f1', '<f4')]) . However, the above statement leads to an array whose elements are also of type np.object :

 X[0] array([(67111L, 1.0), (104242L, 1.0)], dtype=object) 

Does anyone know why?

The following statement should be equivalent to the above, but gives the desired result:

 X = np.empty((len(instances),), dtype = np.object) X[:] = instances X[0] array([(67111L, 1.0), (104242L, 1.0), dtype=[('f0', '<u4'), ('f1', '<f4')]) 

Thanks and best regards, Peter

0
source share
1 answer

Stefan van der Walt (multi-valued developer) explains :

The ndarray constructor does everything possible to guess what data you are feeding, but sometimes it needs a bit of help ....

I prefer to build arrays explicitly, so there is no doubt about what is happening under the hood:

When you say something like

 instance1=np.array([(67111L,1.0),(104242L,1.0)],dtype=np.dtype([('f0', '<u4'), ('f1', '<f4')])) instance2=np.array([(67112L,2.0),(104243L,2.0)],dtype=np.dtype([('f0', '<u4'), ('f1', '<f4')])) instances=[instance1,instance2] Y=np.array(instances, dtype = np.object) 

np.array forced to guess what the dimension of the array you want is. instances is a list of two objects, each of which has a length of 2. Thus, quite reasonably, np.array assumes that Y should be of the form (2,2):

 print(Y.shape) # (2, 2) 

In most cases, I think this is what would be desirable. However, in your case, since this is not what you want, you must construct the array explicitly:

 X=np.empty((len(instances),), dtype = np.object) print(X.shape) # (2,) 

Now there is no question about form X: (2, ) and therefore, when you load data

 X[:] = instances 

numpy is smart enough to consider instances as a sequence of two objects.

+1
source

All Articles