Unnecessary nested structured arrays by reference

I have a folowwing data structure:

N=100 TB = {'names':('n', 'D'),'formats':(int, int)} TA = {'names':('id', 'B'),'formats':(int, dtype((TB, (N))))} a = np.empty(1000, dtype=TA) b = np.empty(N, dtype=TB) 

where a is a structured array with two fields: 'id' and 'B'. In "B", another structured array is stored with the fields "n" and "D", for example

 for i in range(0,1000): a['B'][i] = b 

When performing the above assignment, the data from b is copied to a. Is there a way to copy only the link to b, so when I change b, the change is reflected in a['B'][i] ? I want to save pointers to b in a, because I do not need to create copies, since the data in b is the same for each row.

I'm tired

 TA = {'names':('id', 'B'),'formats':(int, object)} 

and it works, but breaks the nested array structure. Is there a way to save a structured array, for example. a['B']['D']

thanks

+4
source share
2 answers

The short answer is no. Although the syntax for numpy arrays looks the same as the standard python syntax, what happens behind the scenes is very different. Complex numpy data types, such as TA , use large blocks of contiguous memory to store each record; the memory must be laid out regularly, or everything falls apart.

Therefore, when you create an array of 1000 elements with a nested data type, such as TA , you actually allocate 1000 memory blocks, each of which is large enough to contain N distinct TB arrays. That is why you can do things like a['B']['D'] - or to indicate a point on it, for example:

 >>> (a['B'][1]['D'] == a['B']['D'][1]).all() True >>> a['B'][1]['D'][0] = 123456789 >>> (a['B'][1]['D'] == a['B']['D'][1]).all() True 

For regular Python objects, the above will not be done, since the order of access to the object matters. It is actually very strange that this is possible in numpy , and the only reason this is possible is because numpy uses uniformly structured continuous memory.

As far as I know, numpy provides no way to do what you ask (someone correct me if I'm wrong!), And the required numbering will probably require significant changes to the numpy API.

I will add that I do not think it makes sense to do this anyway. If only one copy of the array is required, why not just store it outside the array? You can even pass it along with a numpy array, as part of tuple or namedtuple .

+4
source

Yes, you can just open the view. But this works differently, as you described:

 >>> a = np.array([1,2,3,4,5,6]) >>> b = a[2:4].view() >>> b[0] = 0 >>> b[1] = 0 >>> a array([1, 2, 0, 0, 5, 6]) 
0
source

All Articles