I am reading a CSV file from an external source and trying to save it as hdfs using the pandas.to_hdf () function. I upload the csv file to the pandas framework, do the massaging, and then call the to_hdf method. When the to_hdf () function is called, the following exception occurs:
File "C:\Python\Python27\Python27\lib\site-packages\pandas\core\generic.py", line 902, in to_hdf
return pytables.to_hdf(path_or_buf, key, self, **kwargs)
File "C:\Python\Python27\Python27\lib\site-packages\pandas\io\pytables.py", line 267, in to_hdf
f(store)
File "C:\Python\Python27\Python27\lib\site-packages\pandas\io\pytables.py", line 262, in <lambda>
f = lambda store: store.put(key, value, **kwargs)
File "C:\Python\Python27\Python27\lib\site-packages\pandas\io\pytables.py", line 810, in put
self._write_to_group(key, value, append=append, **kwargs)
File "C:\Python\Python27\Python27\lib\site-packages\pandas\io\pytables.py", line 1259, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File "C:\Python\Python27\Python27\lib\site-packages\pandas\io\pytables.py", line 3751, in write
**kwargs)
File "C:\Python\Python27\Python27\lib\site-packages\pandas\io\pytables.py", line 3433, in create_axes
raise e
TypeError: Cannot serialize the column [item_id] because
its data contents are [mixed-integer] object dtype
item_id is not really a column, it is part of the multi-index that I set after loading the data frame. when I print index types, I see:
>>> df.index.dtype
dtype('O')
I am trying to figure out how to convert this item_id index to a string. I also tried setting the type change to row type before setting the index by doing all of the following, but the type remains an object:
df['item_id'] = df['item_id'].astype('str')
df['item_id'] = df['item_id'].astype(str)
, , pandas . , to_hdf ?
UPDATE:
, , to_hdf pandas.concat . , , read_csv int . int + string, read_csv . , pd.concat , , . , to_hdf :
df["item_id"] = df["item_id"].map(str).map(str.strip)