Selecting items based on padding entries in Python pandas

Question

Selecting items based on padding entries in Python pandas

I have a python pandas DataFrame question. There are two DataFrames containing records, df1 and df2 . They contain the following values:

df1:
   pkid  start   end
0     0   2005  2005
1     1   2006  2006
2     2   2007  2007
3     3   2008  2008
4     4   2009  2009

df2:
   pkid  start   end
0     3   2008  2008
1   NaN   2009  2009
2   NaN   2010  2010

I want to isolate the w / index = 2 entry from df2 . In other words, I am looking for all df2 entries if df1 does not have matching entries where only the values of the start and end columns are taken into account. Thanks!

+4

python numpy pandas

scagnetti Oct 25 '13 at 19:04

source share

2 answers

, isin

df1['key'] = df1.apply(lambda r: str(r['start']) + str(r['end']), axis=1)
df2['key'] = df2.apply(lambda r: str(int(r['start'])) + str(int(r['end'])), axis=1)

df2.key.isin(df1.key.tolist())
0    True
1    True
2    False


df2[~df2.key.isin(df1.key.tolist())]
pkid  start   end
2   NaN   2010  2010

0

user1827356 25 . '13 20:18

Roman Pekar · Accepted Answer · 2013-10-25T20:45:04+0000

antijoin (▷) SQL. pandas , .

, :)

>>> t1 = df1[["start", "end"]]
>>> t2 = df2[["start", "end"]]
>>> f = t2.apply(lambda x2: t1.apply(lambda x1: x1.isin(x2).all(), axis=1).any(), axis=1)
>>> df2[~f]
    end  pkid  start
2  2010   NaN   2010

: SQL , not exists:

select *
from df2
where not exists (select * from df1 where df1.start = df2.start and df1.end = df2.end)

left outer join where:

select *
from df1
    left outer join df1 on df1.start = df2.start and df1.end = df1.end
where df1.<key> is null

pandas merge:

>>> m = pd.merge(df2, df1, how='left', on=['end','start'], suffixes=['','_r'])
>>> df2[m['pkid_r'].isnull()]
    end  pkid  start
2  2010   NaN   2010

Selecting items based on padding entries in Python pandas

More articles: