Pandas DataFrame Filter Slider

Question

I do not understand pandas DataFrame filter .

 import pandas as pd df = pd.DataFrame( [ ['Hello', 'World'], ['Just', 'Wanted'], ['To', 'Say'], ['I\'m', 'Tired'] ] )

 df.filter([0], regex=r'(Hel|Just)', axis=0)

I expect that [0] will indicate the 1st column as the one to look at, and axis=0 to indicate the filter rows. I get the following:

  0 1 0 Hello World

I expected

  0 1 0 Hello World 1 Just Wanted

+6

piRSquared May 06 '16 at 20:07

2 answers

This should work:

df[df[0].str.contains('(Hel|Just)', regex=True)]

+3

Max May 06 '16 at 20:24

unutbu · Accepted Answer · 2016-05-06T20:21:13+0000

The arguments are mutually exclusive, but this is not verified for

So, it appears that the first optional argument items=[0] superior to the third optional argument regex=r'(Hel|Just)' .

 In [194]: df.filter([0], regex=r'(Hel|Just)', axis=0) Out[194]: 0 1 0 Hello World

equivalently

 In [201]: df.filter([0], axis=0) Out[201]: 0 1 0 Hello World

which simply selects rows (rows) with indices at [0] along the 0 axis.

To get the desired result, you can use str.contains to create a boolean mask, and use df.loc to select the strings:

 In [210]: df.loc[df.iloc[:,0].str.contains(r'(Hel|Just)')] Out[210]: 0 1 0 Hello World 1 Just Wanted