Stefan van der Walt (multi-valued developer) explains :
The ndarray constructor does everything possible to guess what data you are feeding, but sometimes it needs a bit of help ....
I prefer to build arrays explicitly, so there is no doubt about what is happening under the hood:
When you say something like
instance1=np.array([(67111L,1.0),(104242L,1.0)],dtype=np.dtype([('f0', '<u4'), ('f1', '<f4')])) instance2=np.array([(67112L,2.0),(104243L,2.0)],dtype=np.dtype([('f0', '<u4'), ('f1', '<f4')])) instances=[instance1,instance2] Y=np.array(instances, dtype = np.object)
np.array forced to guess what the dimension of the array you want is. instances is a list of two objects, each of which has a length of 2. Thus, quite reasonably, np.array assumes that Y should be of the form (2,2):
print(Y.shape)
In most cases, I think this is what would be desirable. However, in your case, since this is not what you want, you must construct the array explicitly:
X=np.empty((len(instances),), dtype = np.object) print(X.shape)
Now there is no question about form X: (2, ) and therefore, when you load data
X[:] = instances
numpy is smart enough to consider instances as a sequence of two objects.