Effectively check if a value is present in any of the given ranges

I have two pandas DataFrame objects:

  • Acomprises 'start'and 'finish'columns

  • B has a column 'date'

The goal is to effectively create a boolean mask indicating whether it is datein the [start, finish]range

A naive repetition takes too much time, I think there is a way to make it faster

UPDATE: Aand Bhave a different number of rows

UPDATE2: Example:

A
    | start     | finish    |
    |-------    |--------   |
    | 1         | 3         |
    | 50        | 83        |
    | 30        | 42        |

B
    | date      | 
    |-------    |
    | 31        | 
    | 20        | 
    | 2.5       |
    | 84        |
    | 1000      |

Output:
            | in_interval | 
            |-------    |
            | True      | 
            | False     | 
            | True      |
            | False     |
            | False     |

PS I have data in datetime format, but I think that the solution will not differ from one for numbers

+4
source share
2 answers

O (n). , . A . , (.. , ).

A = pd.DataFrame(
    data={
        'start': [1, 50, 30],
        'finish': [3, 83, 42]    
    }
)

starts = pd.DataFrame(data={'start': 1}, index=A.start.tolist())
finishs = pd.DataFrame(data={'finish': -1}, index=A.finish.tolist())
transitions = pd.merge(starts, finishs, how='outer', left_index=True, right_index=True).fillna(0)
transitions

    start  finish
1       1       0
3       0      -1
30      1       0
42      0      -1
50      1       0
83      0      -1

. , . . :

transitions['transition'] = (transitions.pop('finish') + transitions.pop('start')).cumsum()
transitions

    transition
1            1
3            0
30           1
42           0
50           1
83           0

:

  • 1,
  • 3,
  • , 0, .
  • ,

B:

B = pd.DataFrame(
    index=[31, 20, 2.5, 84, 1000]
)

pd.merge(transitions, B, how='outer', left_index=True, right_index=True).fillna(method='ffill').loc[B.index].astype(bool)

       transition
31.0         True
20.0        False
2.5          True
84.0        False
1000.0      False
+4

IIUC , True, , ?

apply(lambda) ? ( , B). , :

def in_range(date,start,finish):
    return (True in ((start < date) & (date < finish)).unique())

B.date.apply(lambda x: in_range(x,A.start,A.finish))

:

0     True
1    False
2     True
3    False
4    False

EDIT: MaxU . 10 000 (A B):

%timeit B2.date.apply(lambda x: in_range(x,A2.start,A2.finish))
1 loop, best of 3: 9.82 s per loop

%timeit B2.date.apply(lambda x: ((x >= A2.start) & (x <= A2.finish)).any())
1 loop, best of 3: 7.31 s per loop
+1

All Articles