Saving pandas data to a file using bcolz

Question

Saving pandas data to a file using bcolz

I want to use bcolz to save a pandas data frame to a file.

I tried:

import bcolz import pandas as pd df = pd.read_csv(open("mydata.csv", 'rb'), delimiter='\t') ct = bcolz.ctable.fromdataframe(df)

After that, ct contains a compressed data frame, but I cannot find a way to save it in a file.

+5

python pandas

M. Page Jul 26 '15 at 9:07

source share

3 answers

You just need to indicate where to create the table when you read in the dataframe, for example:

 import bcolz import pandas as pd df = pd.read_csv(open("mydata.csv", 'rb'), delimiter='\t') ct = bcolz.ctable.fromdataframe(df, rootdir='dataframe.bcolz')

+7

Jeff Aug 18 '15 at 21:01

source share

It looks like bcolz.ctable has a tohdf5 method that you could use; however you will need to install hdf5, pytables, etc. Otherwise, you can use pickle , which is the usual way to save a shared Python object to disk.

By the way, if you are simply interested in compressing your data, you might want to take a look at a lower-tech version, for example gzip ; compression will be just as good, if not better, than the column data format, which is more associated with a quick request for your data.

+1

maxymoo Jul 27 '15 at 0:26

source share

Francesc · Accepted Answer · 2015-08-05T08:30:03+0000

You can use bcolz with persistent data containers just like in memory. You can take a look at this tutorial that works with disk datasets using pandas / HDF5, pure PyTables, SQLite, and bcolz:

https://github.com/FrancescAlted/EuroPython2015/blob/master/4-On-Disk-Tables.ipynb

Saving pandas data to a file using bcolz

More articles: