Pandas The OR statement contained in the series contains

I have a DataFrame dfthat has columns typeand subtypeand about 100k rows, I'm trying to classify which data it dfcontains by checking the type/ combinations subtype. Although it dfmay contain many different combinations, there are certain combinations that appear only in certain types of data. To check if my objects contain any of these combinations that I am currently doing:

typeA = ((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | 
         (df.subtype == 5) | (df.subtype == 6))) | 
         ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | 
         (df.subtype ==  8)))
A = typeA.sum()

Where typeA is a long series of fakes that some Trues can have, if A> 0, then I know that it contains True. The problem with this scheme is that if the first line of df creates True, it should still check everything else. Testing the entire DataFrame is faster than using a for loop with a break, but I am wondering if there is a better way to do this.

Thanks for any suggestions.

+4
source share
2 answers

use crosstab:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0, 10, size=(100, 2)), columns=["type", "subtype"])
counts = pd.crosstab(df.type, df.subtype)

print counts.loc[0, [2, 3, 5, 6]].sum() + counts.loc[5, [3, 4, 7, 8]].sum()

the result will be the same as:

a = (((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | 
         (df.subtype == 5) | (df.subtype == 6))) | 
         ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | 
         (df.subtype ==  8))))
a.sum()
+5
source

pandas 0.13 ( ) query, numexpr, :

df.query("((df.type == 0) & ((df.subtype == 2) | (df.subtype == 3) | 
         (df.subtype == 5) | (df.subtype == 6))) | 
         ((df.type == 5) & ((df.subtype == 3) | (df.subtype == 4) | (df.subtype == 7) | 
         (df.subtype ==  8)))")

. , , , ( df.type:

df.query("((type == 0) & ((subtype == 2)"
                        "|(subtype == 3)"
                        "|(subtype == 5)"
                        "|(subtype == 6)))"
        "|((type == 5) & ((subtype == 3)"
                        "|(subtype == 4)"
                        "|(subtype == 7)"
                        "|(subtype ==  8)))")

. , , , , "in":

df.query("(type == 0) & (subtype in [2, 3, 5, 6])"
        "|(type == 5) & (subtype in [3, 4, 7, 8])")
+1

All Articles