Python: Split a NumPy array based on values in an array

Question

Python: Split a NumPy array based on values in an array

I have one big array:

[(1.0, 3.0, 1, 427338.4297000002, 4848489.4332) (1.0, 3.0, 2, 427344.7937000003, 4848482.0692) (1.0, 3.0, 3, 427346.4297000002, 4848472.7469) ..., (1.0, 1.0, 7084, 427345.2709999997, 4848796.592) (1.0, 1.0, 7085, 427352.9277999997, 4848790.9351) (1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)]

I want to split this array into several arrays based on the second value in the array (3.0, 3.0, 3.0 ... 1.0.1.0.10).

Every time the second value changes, I want a new array, so basically every new array has the same second value. I looked it up on Stack and know the command

 np.split(array, number)

but I'm not trying to split an array into a specific number of arrays, but rather into a value. How can I split the array this way as above? Any help would be appreciated!

+8

python arrays split numpy

whent1991 Aug 6 '15 at 18:20

source share

1 answer

Ashwini chaudhary · Accepted Answer · 2015-08-06T18:25:03+0000

You can find indexes whose values differ using numpy.where and numpy.diff on the first column:

 >>> arr = np.array([(1.0, 3.0, 1, 427338.4297000002, 4848489.4332), (1.0, 3.0, 2, 427344.7937000003, 4848482.0692), (1.0, 3.0, 3, 427346.4297000002, 4848472.7469), (1.0, 1.0, 7084, 427345.2709999997, 4848796.592), (1.0, 1.0, 7085, 427352.9277999997, 4848790.9351), (1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)]) >>> np.split(arr, np.where(np.diff(arr[:,1]))[0]+1) [array([[ 1.00000000e+00, 3.00000000e+00, 1.00000000e+00, 4.27338430e+05, 4.84848943e+06], [ 1.00000000e+00, 3.00000000e+00, 2.00000000e+00, 4.27344794e+05, 4.84848207e+06], [ 1.00000000e+00, 3.00000000e+00, 3.00000000e+00, 4.27346430e+05, 4.84847275e+06]]), array([[ 1.00000000e+00, 1.00000000e+00, 7.08400000e+03, 4.27345271e+05, 4.84879659e+06], [ 1.00000000e+00, 1.00000000e+00, 7.08500000e+03, 4.27352928e+05, 4.84879094e+06], [ 1.00000000e+00, 1.00000000e+00, 7.08600000e+03, 4.27359161e+05, 4.84878743e+06]])]

Explanation:

Here we first collect the elements in the second column:

 >>> arr[:,1] array([ 3., 3., 3., 1., 1., 1.])

Now, to find out where the elements actually change, we can use numpy.diff :

 >>> np.diff(arr[:,1]) array([ 0., 0., -2., 0., 0.])

Any thing other than zero means that the element next to it was different, we can use numpy.where to find the indices of non-zero elements, and then add 1 to it, because the actual index of such an element is one more than the returned index :

 >>> np.where(np.diff(arr[:,1]))[0]+1 array([3])

Python: Split a NumPy array based on values ​​in an array

More articles:

Python: Split a NumPy array based on values in an array