Dtype: integer but loc returns float

Question

Dtype: integer but loc returns float

I have a strange data set:

year firms age survival 0 1977 564918 0 NaN 2 1978 503991 0 NaN 3 1978 413130 1 0.731310 5 1979 497805 0 NaN 6 1979 390352 1 0.774522

where I selected dtype from the first three columns as an integer:

 >>> df.dtypes year int64 firms int64 age int64 survival float64

But now I want to search in another table based on the index here:

 idx = 331 otherDf.loc[df.loc[idx, 'age']] Traceback (most recent call last): (...) KeyError: 8.0

It comes from

 df.loc[idx, 'age'] 8.0

Why does this return a float return value? And how can I do a search in otherDf ? I am in pandas version 0.15 .

+5

python types pandas dataframe

Foobar Feb 11 '15 at 17:15

source share

4 answers

This was a bug in pandas prior to version 0.19. It seems to be fixed in version 0.20. Wed https://github.com/pandas-dev/pandas/issues/11617

+1

Mike jarvis May 25, '17 at 17:05

source share

Do you need to use loc ? What about this:

 otherDf.loc(df['age'][idx])

Capturing value through "age" Series returns the corresponding type ( int64 )

0

sharshofski Feb 11 '15 at 18:43

source share

I cannot reproduce this behavior with Pandas 0.15.1.

 >>> pd.__version__ '0.15.1' >>> df = pd.DataFrame({"age": [1,8]}) >>> df age 0 1 1 8 >>> df.dtypes age int64 dtype: object >>> df.loc[1, "age"] 8 >>> type(df.loc[1, "age"]) <type 'numpy.int64'>

Spontaneously, I could not find the corresponding entry in the change lists, but we may want to find out if you are using 0.15.0 or something new.

Edit:

Adding another column with a float type really causes the row data type to become normalized to float (as indicated in his answer as ajcr):

 >>> df = pd.DataFrame({"age": [1, 8], "greatness": [0.2, 1.7]}) >>> type(df.loc[1, "age"]) <type 'numpy.float64'>

0

Jan-Philip Gehrcke Feb 11 '15 at 18:58

source share

Alex Riley · Accepted Answer · 2015-02-11T18:56:47+0000

You return a float because each line contains a combination of the float and int types. After selecting the row index using loc , the integers are converted to floats:

 >>> df.loc[4] year 1979.000000 firms 390352.000000 age 1.000000 survival 0.774522 Name: 4, dtype: float64

Therefore, selecting the age entry here with df.loc[4, 'age'] will give 1.0 .

To get around this and return an integer, you can use loc to select only from the age column, and not for the entire DataFrame:

 >>> df['age'].loc[4] 1

Dtype: integer but loc returns float

More articles: