Pandas search by value

I have the following DataFrame:

Date best abcd 1990 a 5 4 7 2 1991 c 10 1 2 0 1992 d 2 1 4 12 1993 a 5 8 11 6 

I would like to make a dataframe as follows:

 Date best value 1990 a 5 1991 c 2 1992 d 12 1993 a 5 

So, I'm looking to find a value based on another row value using column names. For example, the value for 1990 in the second df should look for β€œa” from the first df, and the second line should look for β€œc” (= 2) from the first df.

Any ideas?

+16
python numpy pandas dataframe
source share
3 answers

There is a built-in lookup function that can handle such situations (search by row / column). I do not know how optimized this is, but it can be faster than the solution used.

 In [9]: df['value'] = df.lookup(df.index, df['best']) In [10]: df Out[10]: Date best abcd value 0 1990 a 5 4 7 2 5 1 1991 c 10 1 2 0 2 2 1992 d 2 1 4 12 12 3 1993 a 5 8 11 6 5 
+13
source share

You create a search function and call apply in the data row by row, this is not very efficient for large dfs, although

 In [245]: def lookup(x): return x[x.best] df['value'] = df.apply(lambda row: lookup(row), axis=1) df Out[245]: Date best abcd value 0 1990 a 5 4 7 2 5 1 1991 c 10 1 2 0 2 2 1992 d 2 1 4 12 12 3 1993 a 5 8 11 6 5 
+4
source share

You can do this using np.where , as shown below. I think it will be more effective.

 import numpy as np import pandas as pd df = pd.DataFrame([['1990', 'a', 5, 4, 7, 2], ['1991', 'c', 10, 1, 2, 0], ['1992', 'd', 2, 1, 4, 12], ['1993', 'a', 5, 8, 11, 6]], columns=('Date', 'best', 'a', 'b', 'c', 'd')) arr = df.best.values cols = df.columns[2:] for col in cols: arr2 = df[col].values arr = np.where(arr==col, arr2, arr) df.drop(columns=cols, inplace=True) df["values"] = arr df 

Result

 Date best values 0 1990 a 5 1 1991 c 2 2 1992 d 12 3 1993 a 5 
0
source share

All Articles