Pandas Pytables Alerts and Slow Performance

Question

Pandas Pytables Alerts and Slow Performance

I tested pandas and pytables for some large financial datasets and ran a real stumbling block:

When stored in the pytables pandas file, it appears that multidimensional data is stored in massive long rows rather than columns.

try the following:

from pandas import * df = DataFrame({'col1':randn(100000000),'col2':randn(100000000)}) store = HDFStore('test.h5') store['data'] = df #should be a warning here about exceeding the maximum recommended rowsize store.handle

output:

 File(filename=test7.h5, title='', mode='a', rootUEP='/', filters=Filters(complevel=0, shuffle=False, fletcher32=False)) / (RootGroup) '' /data (Group) '' /data/axis0 (Array(2,)) '' atom := StringAtom(itemsize=4, shape=(), dflt='') maindim := 0 flavor := 'numpy' byteorder := 'irrelevant' chunkshape := None /data/axis1 (Array(100000000,)) '' atom := Int64Atom(shape=(), dflt=0) maindim := 0 flavor := 'numpy' byteorder := 'little' chunkshape := None /data/block0_items (Array(2,)) '' atom := StringAtom(itemsize=4, shape=(), dflt='') maindim := 0 flavor := 'numpy' byteorder := 'irrelevant' chunkshape := None /data/block0_values (Array(2, 100000000)) '' atom := Float64Atom(shape=(), dflt=0.0) maindim := 0 flavor := 'numpy' byteorder := 'little' chunkshape := None

I'm not quite sure, but I believe that in combination with the error message, Array (2100000000) means a 2D array with 2 rows and 100,000,000 columns. It is also shown in HDFView.

I experience extremely poor performance (10 seconds for data ['ticks']. Head () in some cases), is this to blame?

+4

performance python pandas hdf5 pytables

John_c Aug 21 '12 at 10:30

source share

1 answer

Wes mckinney · Accepted Answer · 2012-08-29T22:38:41+0000

I linked the issue with GitHub:

http://github.com/pydata/pandas/issues/1824

I personally did not know about this problem, and frankly, it is a little disappointing that this is a problem for PyTables or HDF5 (whoever is the culprit).

Pandas Pytables Alerts and Slow Performance

More articles: