Why does pandas convert unsigned int more than 2 ** 63-1 to objects?

When I convert a numpy array to a pandas pandas frame, it changes the uint64 types to object types if the integer is greater than 2 ^ 63 - 1.

import pandas as pd import numpy as np x = np.array([('foo', 2 ** 63)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)])) y = np.array([('foo', 2 ** 63 - 1)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)])) print pd.DataFrame(x).dtypes.unsigned dtype('O') print pd.DataFrame(y).dtypes.unsigned dtype('uint64') 

This is annoying as I cannot write the data frame to the hdf file in table format:

 pd.DataFrame(x).to_hdf('x.hdf', 'key', format = 'table') 

Ouput:

TypeError: cannot serialize column [unsigned] because its data content is [integer] object dtype

Can someone explain type conversion?

+6
source share
2 answers

This is a mistake, but you can return it back to uint64 using DataFrame.astype()

 x = np.array([('foo', 2 ** 63)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)])) a = pd.DataFrame(x) a['unsigned'] = a['unsigned'].astype(np.uint64) >>>a.dtypes string object unsigned uint64 dtype: object 

Other methods used to convert data types to numeric values ​​caused errors or did not work:

 >>>pd.to_numeric(a['unsigned'], errors = coerce) OverflowError: Python int too large to convert to C long >>>a.convert_objects(convert_numeric = True).dtypes string object unsigned object dtype: object 
+5
source
 x = np.array([('foo', 2 ** 63)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', 'f4')])) y = np.array([('foo', 2 ** 63 - 1)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', 'i8')])) 
0
source

All Articles