Join two large pandas.HDFStore HDF5 files

This question is somehow related to "Concatenating a large number of HDF5 files . "

I have some huge HDF5 files (~ 20 GB compression) that cannot fit in RAM. Each of them stores several pandas.DataFrame same format and indexes that do not overlap.

I would like to merge them into a single HDF5 file with all DataFrames data that would be merged correctly. One way to do this is to read each of them chunk-by-chunk and then save it in a single file, but actually it will take quite a lot of time.

Are there any special tools or methods for doing this without iterating through files?

+8
python pandas hdf5 pytables
source share
1 answer

see docs here for the odo project (formerly into ). Please note, if you use the into library, then the order of the arguments has been switched (this was the motivation for changing the name to avoid confusion!)

Basically you can:

 from odo import odo odo('hdfstore://path_store_1::table_name', 'hdfstore://path_store_new_name::table_name') 

performing several operations like this will be added to the rhs repository.

This will automatically do operations with blocks.

+11
source share

All Articles