Selecting individual values ​​from pandas data using lists

import numpy as np import pandas as pd ind = [0, 1, 2] cols = ['A','B','C'] df = pd.DataFrame(np.arange(9).reshape((3,3)),columns=cols) 

Let's say you have a pandas dataframe df similar to:

  ABC 0 0 1 2 1 3 4 5 2 6 7 8 

If you want to capture one element from each column in cols with a specific index ind , the output should look like a series:

  A 0 B 4 C 8 

I have tried so far:

  df.loc[ind,cols] 

which gives an undesirable conclusion:

  ABC 0 0 1 2 1 3 4 5 2 6 7 8 

Any suggestions?

Context: The next step is to match the output of the df.idxmax() call df.idxmax() one data frame to another data frame with the same column names and indexes, but I can probably figure it out if I know how to perform the above conversion.

+7
python pandas indexing dataframe
source share
5 answers

you can use DataFrame.lookup () :

 In [6]: pd.Series(df.lookup(df.index, df.columns), index=df.columns) Out[6]: A 0 B 4 C 8 dtype: int32 

or

 In [14]: pd.Series(df.lookup(ind, cols), index=df.columns) Out[14]: A 0 B 4 C 8 dtype: int32 

Explanation:

 In [12]: df.lookup(df.index, df.columns) Out[12]: array([0, 4, 8]) 
+8
source share

Here's a vector with a vectorized number with NumPy advanced-indexing to select one item per column, given ind row indices per col -

 pd.Series(df.values[ind, np.arange(len(ind))], df.columns) 

Run Example -

 In [107]: ind = [0, 2, 1] # different one than sample for variety ...: cols = ['A','B','C'] ...: df = pd.DataFrame(np.arange(9).reshape((3,3)),columns=cols) ...: In [109]: df Out[109]: ABC 0 0 1 2 1 3 4 5 2 6 7 8 In [110]: pd.Series(df.values[ind, np.arange(len(ind))], df.columns) Out[110]: A 0 B 7 C 5 dtype: int64 

Runtime test

Let's compare the sentence with the built-in vectorized lookup > method proposed in the @MaxU solution, and since we see how good the vectorized ones are, let the number of cols be greater -

 In [111]: ncols = 10000 ...: df = pd.DataFrame(np.random.randint(0,9,(100,ncols))) ...: ind = np.random.randint(0,100,(ncols)).tolist() ...: # @MaxU solution In [112]: %timeit pd.Series(df.lookup(ind, df.columns), index=df.columns) 1000 loops, best of 3: 718 Β΅s per loop # Proposed in this post In [113]: %timeit pd.Series(df.values[ind, np.arange(len(ind))], df.columns) 1000 loops, best of 3: 410 Β΅s per loop In [114]: ncols = 100000 ...: df = pd.DataFrame(np.random.randint(0,9,(100,ncols))) ...: ind = np.random.randint(0,100,(ncols)).tolist() ...: # @MaxU solution In [115]: %timeit pd.Series(df.lookup(ind, df.columns), index=df.columns) 100 loops, best of 3: 8.83 ms per loop # Proposed in this post In [116]: %timeit pd.Series(df.values[ind, np.arange(len(ind))], df.columns) 100 loops, best of 3: 5.76 ms per loop 
+7
source share

There is another way to use mutiIndex if you like to use .loc

 df1=df.reset_index().melt('index').set_index(['index','variable']) df1.loc[list(zip(df.index,df.columns))] Out[118]: value index variable 0 A 0 1 B 4 2 C 8 
+2
source share

You can zip the column and index values ​​you would like to get for the values, and then create a series from this:

 pd.Series([df.loc[id_, col] for id_, col in zip(ind, cols)], df.columns) A 0 B 4 C 8 

Or, if you always need a diagonal value:

 pd.Series(np.diag(df), df.columns) 

Will be much faster

+1
source share

There should be a more direct path, but this is what I could think of,

 val = [df.iloc[i,i] for i in df.index] pd.Series(val, index = df.columns) A 0 B 4 C 8 dtype: int64 
+1
source share

All Articles