Dtype: integer but loc returns float

I have a strange data set:

year firms age survival 0 1977 564918 0 NaN 2 1978 503991 0 NaN 3 1978 413130 1 0.731310 5 1979 497805 0 NaN 6 1979 390352 1 0.774522 

where I selected dtype from the first three columns as an integer:

 >>> df.dtypes year int64 firms int64 age int64 survival float64 

But now I want to search in another table based on the index here:

 idx = 331 otherDf.loc[df.loc[idx, 'age']] Traceback (most recent call last): (...) KeyError: 8.0 

It comes from

 df.loc[idx, 'age'] 8.0 

Why does this return a float return value? And how can I do a search in otherDf ? I am in pandas version 0.15 .

+5
source share
4 answers

You return a float because each line contains a combination of the float and int types. After selecting the row index using loc , the integers are converted to floats:

 >>> df.loc[4] year 1979.000000 firms 390352.000000 age 1.000000 survival 0.774522 Name: 4, dtype: float64 

Therefore, selecting the age entry here with df.loc[4, 'age'] will give 1.0 .

To get around this and return an integer, you can use loc to select only from the age column, and not for the entire DataFrame:

 >>> df['age'].loc[4] 1 
+7
source

This was a bug in pandas prior to version 0.19. It seems to be fixed in version 0.20. Wed https://github.com/pandas-dev/pandas/issues/11617

+1
source

Do you need to use loc ? What about this:

 otherDf.loc(df['age'][idx]) 

Capturing value through "age" Series returns the corresponding type ( int64 )

0
source

I cannot reproduce this behavior with Pandas 0.15.1.

 >>> pd.__version__ '0.15.1' >>> df = pd.DataFrame({"age": [1,8]}) >>> df age 0 1 1 8 >>> df.dtypes age int64 dtype: object >>> df.loc[1, "age"] 8 >>> type(df.loc[1, "age"]) <type 'numpy.int64'> 

Spontaneously, I could not find the corresponding entry in the change lists, but we may want to find out if you are using 0.15.0 or something new.

Edit:

Adding another column with a float type really causes the row data type to become normalized to float (as indicated in his answer as ajcr):

 >>> df = pd.DataFrame({"age": [1, 8], "greatness": [0.2, 1.7]}) >>> type(df.loc[1, "age"]) <type 'numpy.float64'> 
0
source

Source: https://habr.com/ru/post/1213156/


All Articles