Get the largest values from each pandas.DataFrame column

Question

Get the largest values from each pandas.DataFrame column

Here is my pandas.DataFrame :

 import pandas as pd data = pd.DataFrame({ 'first': [40, 32, 56, 12, 89], 'second': [13, 45, 76, 19, 45], 'third': [98, 56, 87, 12, 67] }, index = ['first', 'second', 'third', 'fourth', 'fifth'])

I want to create a new DataFrame that will contain the top 3 values from each column of my data DataFrame .

Here is the expected result:

  first second third 0 89 76 98 1 56 45 87 2 40 45 67

How can i do this?

+8

python pandas dataframe

Michael vayvala Dec 9 '13 at 17:48

source share

5 answers

With numpy, you can get an array of top-3 values along the columns, for example:

 >>> import numpy as np >>> col_ind = np.argsort(data.values, axis=0)[::-1,:] >>> ind_to_take = col_ind[:3,:] + np.arange(data.shape[1])*data.shape[0] >>> np.take(data.values.T, ind_to_take) array([[89, 76, 98], [56, 45, 87], [40, 45, 67]], dtype=int64)

You can convert back to a DataFrame:

 >>> pd.DataFrame(_, columns = data.columns, index=data.index[:3]) first second third One 89 76 98 Two 56 45 87 Three 40 45 67

+3

alko Dec 9 '13 at 18:14

source share

Other solutions (while writing this), sort a DataFrame with super-linear complexity per column, but this can be done with linear time on a column.

firstly, numpy.partition splits the k smallest elements into k first positions (otherwise unsorted). To get the k largest elements, we can use

 import numpy as np -np.partition(-v, k)[: k]

Combining this with understanding the dictionary, we can use:

 >>> pd.DataFrame({c: -np.partition(-data[c], 3)[: 3] for c in data.columns}) first second third 0 89 76 98 1 56 45 87 2 40 45 67

+1

Ami tavory May 27 '15 at 12:39

source share

Alternative pandas solution:

 In [6]: N = 3 In [7]: pd.DataFrame([df[c].nlargest(N).values.tolist() for c in df.columns], ...: index=df.columns, ...: columns=['{}_largest'.format(i) for i in range(1, N+1)]).T ...: Out[7]: first second third 1_largest 89 76 98 2_largest 56 45 87 3_largest 40 45 67

0

Maxu Oct 16 '16 at 19:21

source share

Use nlargest as

 In [1594]: pd.DataFrame({c: data[c].nlargest(3).values for c in data}) Out[1594]: first second third 0 89 76 98 1 56 45 87 2 40 45 67

<sub> where_sub>

 In [1603]: data Out[1603]: first second third first 40 13 98 second 32 45 56 third 56 76 87 fourth 12 19 12 fifth 89 45 67

0

Zero Oct 05 '17 at 16:31

source share

Zelazny7 · Accepted Answer · 2013-12-09T18:25:43+0000

Create a function to return the top three values of a series:

 def sorted(s, num): tmp = s.sort_values(ascending=False)[:num] # earlier s.order(..) tmp.index = range(num) return tmp

Apply it to your dataset:

 In [1]: data.apply(lambda x: sorted(x, 3)) Out[1]: first second third 0 89 76 98 1 56 45 87 2 40 45 67

Get the largest values ​​from each pandas.DataFrame column

More articles:

Get the largest values from each pandas.DataFrame column