I have a Python pandas DataFrame in which each element is a float or NaN. For each row, I will need to find a column that contains the nth row number. That is, I need to get a column containing the nth element of the row, which is not NaN. I know that the nth such column always exists.
So, if n is 4, and the pandas framework called myDF was as follows:
10 20 30 40 50 60 70 80 90 100 'A' 4.5 5.5 2.5 NaN NaN 2.9 NaN NaN 1.1 1.8 'B' 4.7 4.1 NaN NaN NaN 2.0 1.2 NaN NaN NaN 'C' NaN NaN NaN NaN NaN 1.9 9.2 NaN 4.4 2.1 'D' 1.1 2.2 3.5 3.4 4.5 NaN NaN NaN 1.9 5.5
I would like to get:
'A' 60 'B' 70 'C' 100 'D' 40
I could do:
import pandas as pd import math n = some arbitrary int for row in myDF.indexes: num_not_NaN = 0 for c in myDF.columns: if math.isnan(myDF[c][row]) == False: num_not_NaN +=1 if num_not_NaN==n: print row, c break
I am sure it is very slow and not very pythonic. Is there an approach that will be faster if I deal with a very large DataFrame and large n values?
source share