Compare values in two columns of a data frame

Question

Compare values in two columns of a data frame

I have the following two columns in a pandas data frame

     256   Z
0     2    2
1     2    3
2     4    4
3     4    9

There are about 1594 lines. '256' and 'Z' are column headers, while 0,1,2,3,4 are row numbers (1st column above). I want to print line numbers where the value in column "256" is not equal to the values in column "Z". Thus, the output in the above case will be 1, 3. How can this comparison be done in pandas? I will be very grateful for the help. Thank.

+4

python pandas

user3282777 Jan 24 '15 at 9:45

source share

4 answers

cel · Answer 1 · 2015-01-24T10:22:15+0000

Create a data frame:

import pandas as pd
df = pd.DataFrame({"256":[2,2,4,4], "Z": [2,3,4,9]})

Ouput:

After a subset of your data frame, use the index to get the row identifier in the subset:

row_ids = df[df["256"] != df.Z].index

gives

Int64Index([1, 3], dtype='int64')

aus_lacy · Answer 2 · 2015-01-24T12:16:31+0000

.loc pandas.DataFrame, , :

df.loc[(df['256'] != df['Z'])].index

:

Int64Index([1, 3], dtype='int64')

, ipython notebook:

import pandas as pd
import numpy as np

df = pd.DataFrame({"256":np.random.randint(0,10,1594), "Z": np.random.randint(0,10,1594)})

%timeit df.loc[(df['256'] != df['Z'])].index
%timeit row_ids = df[df["256"] != df.Z].index
%timeit rows = list(df[df['256'] != df.Z].index)
%timeit df[df['256'] != df['Z']].index

:

1000 loops, best of 3: 352 µs per loop
1000 loops, best of 3: 358 µs per loop
1000 loops, best of 3: 611 µs per loop
1000 loops, best of 3: 355 µs per loop

, 5-10 , , , . 1594 , .

rchang · Answer 3 · 2015-01-24T10:22:08+0000

:

# Assuming your DataFrame is named "frame"
rows = list(frame[frame['256'] != frame.Z].index)

rows , , . , :

>>> frame
   256  Z
0    2  2
1    2  3
2    4  4
3    4  9

[4 rows x 2 columns]
>>> rows = list(frame[frame['256'] != frame.Z].index)
>>> print(rows)
[1, 3]

Primer · Answer 4 · 2015-01-24T10:21:00+0000

, df , :

df[df['256'] != df['Z']].index

:

Int64Index([1, 3], dtype='int64')

Compare values ​​in two columns of a data frame

More articles:

Compare values in two columns of a data frame