How to get the last N lines in a DataFrame panda?

I have pandas dataframe df1 and df2 (df1 is a vanilla dataframe, df2 is indexed by 'STK_ID' and 'RPT_Date'):

 >>> df1 STK_ID RPT_Date TClose sales discount 0 000568 20060331 3.69 5.975 NaN 1 000568 20060630 9.14 10.143 NaN 2 000568 20060930 9.49 13.854 NaN 3 000568 20061231 15.84 19.262 NaN 4 000568 20070331 17.00 6.803 NaN 5 000568 20070630 26.31 12.940 NaN 6 000568 20070930 39.12 19.977 NaN 7 000568 20071231 45.94 29.269 NaN 8 000568 20080331 38.75 12.668 NaN 9 000568 20080630 30.09 21.102 NaN 10 000568 20080930 26.00 30.769 NaN >>> df2 TClose sales discount net_sales cogs STK_ID RPT_Date 000568 20060331 3.69 5.975 NaN 5.975 2.591 20060630 9.14 10.143 NaN 10.143 4.363 20060930 9.49 13.854 NaN 13.854 5.901 20061231 15.84 19.262 NaN 19.262 8.407 20070331 17.00 6.803 NaN 6.803 2.815 20070630 26.31 12.940 NaN 12.940 5.418 20070930 39.12 19.977 NaN 19.977 8.452 20071231 45.94 29.269 NaN 29.269 12.606 20080331 38.75 12.668 NaN 12.668 3.958 20080630 30.09 21.102 NaN 21.102 7.431 

I can get the last 3 lines of df2:

 >>> df2.ix[-3:] TClose sales discount net_sales cogs STK_ID RPT_Date 000568 20071231 45.94 29.269 NaN 29.269 12.606 20080331 38.75 12.668 NaN 12.668 3.958 20080630 30.09 21.102 NaN 21.102 7.431 

and df1.ix[-3:] are all lines:

 >>> df1.ix[-3:] STK_ID RPT_Date TClose sales discount 0 000568 20060331 3.69 5.975 NaN 1 000568 20060630 9.14 10.143 NaN 2 000568 20060930 9.49 13.854 NaN 3 000568 20061231 15.84 19.262 NaN 4 000568 20070331 17.00 6.803 NaN 5 000568 20070630 26.31 12.940 NaN 6 000568 20070930 39.12 19.977 NaN 7 000568 20071231 45.94 29.269 NaN 8 000568 20080331 38.75 12.668 NaN 9 000568 20080630 30.09 21.102 NaN 10 000568 20080930 26.00 30.769 NaN 

Why? How to get the last 3 lines of df1 (dataframe without index)? Pandas 0.10.1

+126
python pandas dataframe
Feb 02 '13 at 14:40
source share
4 answers

Do not forget DataFrame.tail ! e.g. df1.tail(10)

+295
Feb 07 '13 at 21:03
source share

This is due to the use of integer indices ( ix selects them by label over -3 rather than by position, and this is intended: see Indexing integers in "gotchas" * pandas ).

* In newer versions of pandas, they prefer loc or iloc to remove the ix ambiguity as a position or label:

 df.iloc[-3:] 

see documents .

As Wes points out, in this particular case, you should just use the tail!

+53
Feb 03 '13 at 5:02
source share

How to get the last N lines in a DataFrame panda?

If you are slicing by position, __getitem__ (i.e., slicing with [] ) works well and is the most concise solution I have found for this problem.

 pd.__version__ # '0.24.2' df = pd.DataFrame({'A': list('aaabbbbc'), 'B': np.arange(1, 9)}) df AB 0 a 1 1 a 2 2 a 3 3 b 4 4 b 5 5 b 6 6 b 7 7 c 8 

 df[-3:] AB 5 b 6 6 b 7 7 c 8 

This is similar to calling df.iloc[-3:] , for example ( iloc internally delegates to __getitem__ ).




In addition, if you want to find the last N lines for each group, use groupby and GroupBy.tail :

 df.groupby('A').tail(2) AB 1 a 2 2 a 3 5 b 6 6 b 7 7 c 8 
+3
Jan 22 '19 at 7:40
source share

You can also take the last three lines of the data frame as follows:

 df1 = pd.DataFrame({'A': list('aaabbbbc'), 'B': np.arange(1, 9)}) df1[-3:] 
0
Jun 07 '19 at 15:59
source share



All Articles