Numpy.searchsorted with multiple sources

Question

Numpy.searchsorted with multiple sources

Let's say that I have two arrays in the form

a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6] b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]

As you can see, the above arrays are sorted if they are considered a and b as columns of a super array.

Now I want to search in this array. For example, if I search for (3, 7) (a = 3 and b = 7), I should get 6.

If a has duplicate values, the search should continue with the values in b .

Is there a built-in numpy method for this? Or what could be an effective way to do this, assuming I have a million records in my array.

I tried with numpy.recarray to create one re-entry using a and b and tried to search in it, but I get the following error.

 TypeError: expected a readable buffer object

Any help is greatly appreciated.

+4

python numpy

Senthil babu Aug 08 '12 at 15:29

source share

6 answers

You can use duplicate searchsorted left and right:

 left, right = np.searchsorted(a, 3, side='left'), np.searchsorted(a, 3, side='right') index = left + np.searchsorted(b[left:right], 7)

+4

ecatmur Aug 08 '12 at 16:06

source share

This works for me:

 >>> a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6] >>> b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1] >>> Z = numpy.array(zip(a, b), dtype=[('a','int'), ('b','int')]) >>> Z.searchsorted(numpy.asarray((3,7), dtype=Z.dtype)) 6

I think the trick might be to have the argument for searchsorted have the same dtype type as the array. When I try Z.searchsorted((3, 7)) , I get segfault.

+1

Bi rico Aug 08 '12 at 17:29

source share

Expansion

n arrays:

 import numpy as np def searchsorted_multi(*args): v = args[-1] if len(v) != len(args[:-1]): raise ValueError l, r = 0, len(args[0]) ind = 0 for vi, ai in zip(v, args[:-1]): l, r = [np.searchsorted(ai[l:r], vi, side) for side in ('left', 'right')] ind += l return ind if __name__ == "__main__": a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6] b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1] c = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 2] assert(searchsorted_multi(a, b, (3, 7)) == 6) assert(searchsorted_multi(a, b, (3, 0)) == 5) assert(searchsorted_multi(a, b, c, (6, 1, 2)) == 12)

0

Nicolas barbey Aug 08 '12 at 16:10

source share

Here's an interesting way to do this (although this is not the most efficient way, since I believe that O (n), and not O (log (n)) as ecatmur's answer, however, will be more compact)

 np.searchsorted(a + 1j*b, a_val + 1j*b_val)

Example:

 >>> a = np.array([0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]) >>> b = np.array([1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1]) >>> np.searchsorted(a + 1j*b, 4 + 1j*8) 9

0

Jab Aug 08 '12 at 16:27

source share

Or without numpy:

 >>> import bisect >>> a = [0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6] >>> b = [1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1] >>> bisect.bisect_left(zip(a,b), (3,7)) 6

0

Charles Beattie Aug 08 '12 at 16:42

source share

senderle · Accepted Answer · 2012-08-08T16:28:42+0000

You are almost there. It's just that numpy.record (this is what I suppose you used, given the error message you received) is actually not what you want; just create an array of records from one element:

 >>> a_b = numpy.rec.fromarrays((a, b)) >>> a_b rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9), (4, 4), (4, 8), (5, 1), (6, 1)], dtype=[('f0', '<i8'), ('f1', '<i8')]) >>> numpy.searchsorted(a_b, numpy.array((3, 7), dtype=a_b.dtype)) 6

It may also be useful to know that sort and argsort sort arrays of records lexically as well as lexsort . Example using lexsort :

 >>> random_idx = numpy.random.permutation(range(12)) >>> a = numpy.array(a)[random_idx] >>> b = numpy.array(b)[random_idx] >>> sorted_idx = numpy.lexsort((b, a)) >>> a[sorted_idx] array([0, 0, 1, 1, 2, 3, 3, 3, 4, 4, 5, 6]) >>> b[sorted_idx] array([1, 2, 1, 2, 1, 4, 7, 9, 4, 8, 1, 1])

Sorting record arrays:

 >>> a_b = numpy.rec.fromarrays((a, b)) >>> a_b[a_b.argsort()] rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9), (4, 4), (4, 8), (5, 1), (6, 1)], dtype=[('f0', '<i8'), ('f1', '<i8')]) >>> a_b.sort() >>> a_b rec.array([(0, 1), (0, 2), (1, 1), (1, 2), (2, 1), (3, 4), (3, 7), (3, 9), (4, 4), (4, 8), (5, 1), (6, 1)], dtype=[('f0', '<i8'), ('f1', '<i8')])

Numpy.searchsorted with multiple sources

More articles: