Compare values ​​in two columns of a data frame

I have the following two columns in a pandas data frame

     256   Z
0     2    2
1     2    3
2     4    4
3     4    9

There are about 1594 lines. '256' and 'Z' are column headers, while 0,1,2,3,4 are row numbers (1st column above). I want to print line numbers where the value in column "256" is not equal to the values ​​in column "Z". Thus, the output in the above case will be 1, 3. How can this comparison be done in pandas? I will be very grateful for the help. Thank.

+4
source share
4 answers

Create a data frame:

import pandas as pd
df = pd.DataFrame({"256":[2,2,4,4], "Z": [2,3,4,9]})

Ouput:

    256 Z
0   2   2
1   2   3
2   4   4
3   4   9

After a subset of your data frame, use the index to get the row identifier in the subset:

row_ids = df[df["256"] != df.Z].index

gives

Int64Index([1, 3], dtype='int64')
+5

.loc pandas.DataFrame, , :

df.loc[(df['256'] != df['Z'])].index

:

Int64Index([1, 3], dtype='int64')

, ipython notebook:

import pandas as pd
import numpy as np

df = pd.DataFrame({"256":np.random.randint(0,10,1594), "Z": np.random.randint(0,10,1594)})

%timeit df.loc[(df['256'] != df['Z'])].index
%timeit row_ids = df[df["256"] != df.Z].index
%timeit rows = list(df[df['256'] != df.Z].index)
%timeit df[df['256'] != df['Z']].index

:

1000 loops, best of 3: 352 µs per loop
1000 loops, best of 3: 358 µs per loop
1000 loops, best of 3: 611 µs per loop
1000 loops, best of 3: 355 µs per loop

, 5-10 , , , . 1594 , .

+5

:

# Assuming your DataFrame is named "frame"
rows = list(frame[frame['256'] != frame.Z].index)

rows , , . , :

>>> frame
   256  Z
0    2  2
1    2  3
2    4  4
3    4  9

[4 rows x 2 columns]
>>> rows = list(frame[frame['256'] != frame.Z].index)
>>> print(rows)
[1, 3]
+2

, df , :

df[df['256'] != df['Z']].index

:

Int64Index([1, 3], dtype='int64')
0

All Articles