Here's something weird with pandas and HDF for Halloween:
df = pandas.DataFrame([['a','b'] for i in range(1,1000)]) store = pandas.HDFStore('test.h5') store['x'] = df store.close()
then
ls -l test.h5 -rw-r--r-- 1 arthur arthur 1072080 Oct 26 10:50 test.h5
1.1M? A bit cool, but why not. Here where things get really creepy
store = pandas.HDFStore('test.h5') #open it again store['x'] = df #do the same thing as before! store.close()
then
ls -l test.h5 -rw-r--r-- 1 arthur arthur 2122768 Oct 26 10:52 test.h5
You have now entered the Twilight zone. It goes without saying that after the operation, the operation is indistinguishable, but each iteration makes the file a bit more saturated.
This seems to only happen when strings are involved. Before I write a bug report, I would like to know if something is missing here ...
Arthur B.
source share