It's wise to use np.where fast :
>>> a array([[0, 0, 0, 1, 1, 1, 1], [0, 0, 0, 1, 1, 1, 1], [0, 0, 0, 0, 1, 1, 1], [0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0]]) >>> np.where(a>0) (array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 4, 5]), array([3, 4, 5, 6, 3, 4, 5, 6, 4, 5, 6, 6, 6, 6]))
This supplies tuples with coordinate values ββgreater than 0.
You can also use np.where to check each additional array:
def first_true1(a): """ return a dict of row: index with value in row > 0 """ di={} for i in range(len(a)): idx=np.where(a[i]>0) try: di[i]=idx[0][0] except IndexError: di[i]=None return di
Print
{0: 3, 1: 3, 2: 4, 3: 6, 4: 6, 5: 6, 6: None}
those. line 0: index 3> 0; line 4: index 4> 0; line 6: index no more than 0
As you suspect, argmax might be faster:
def first_true2(): di={} for i in range(len(a)): idx=np.argmax(a[i]) if idx>0: di[i]=idx else: di[i]=None return di
If you can deal with the None absence logic for all naughts strings, this happens faster:
def first_true3(): di={} for i, j in zip(*np.where(a>0)): if i in di: continue else: di[i]=j return di
And here is the version that uses the axis in argmax (as suggested in your comments):
def first_true4(): di={} for i, ele in enumerate(np.argmax(a,axis=1)): if ele==0 and a[i][0]==0: di[i]=None else: di[i]=ele return di
For speed comparison (on your array of examples) I get the following:
rate/sec usec/pass first_true1 first_true2 first_true3 first_true4 first_true1 23,818 41.986 -- -34.5% -63.1% -70.0% first_true2 36,377 27.490 52.7% -- -43.6% -54.1% first_true3 64,528 15.497 170.9% 77.4% -- -18.6% first_true4 79,287 12.612 232.9% 118.0% 22.9% --
If I scale it to a 2000 X 2000 array, here is what I get:
rate/sec usec/pass first_true3 first_true1 first_true2 first_true4 first_true3 3 354380.107 -- -0.3% -74.7% -87.8% first_true1 3 353327.084 0.3% -- -74.6% -87.7% first_true2 11 89754.200 294.8% 293.7% -- -51.7% first_true4 23 43306.494 718.3% 715.9% 107.3% --