Saving pandas data to a file using bcolz

I want to use bcolz to save a pandas data frame to a file.

I tried:

import bcolz import pandas as pd df = pd.read_csv(open("mydata.csv", 'rb'), delimiter='\t') ct = bcolz.ctable.fromdataframe(df) 

After that, ct contains a compressed data frame, but I cannot find a way to save it in a file.

+5
source share
3 answers

You can use bcolz with persistent data containers just like in memory. You can take a look at this tutorial that works with disk datasets using pandas / HDF5, pure PyTables, SQLite, and bcolz:

https://github.com/FrancescAlted/EuroPython2015/blob/master/4-On-Disk-Tables.ipynb

+2
source

You just need to indicate where to create the table when you read in the dataframe, for example:

 import bcolz import pandas as pd df = pd.read_csv(open("mydata.csv", 'rb'), delimiter='\t') ct = bcolz.ctable.fromdataframe(df, rootdir='dataframe.bcolz') 
+7
source

It looks like bcolz.ctable has a tohdf5 method that you could use; however you will need to install hdf5, pytables, etc. Otherwise, you can use pickle , which is the usual way to save a shared Python object to disk.

By the way, if you are simply interested in compressing your data, you might want to take a look at a lower-tech version, for example gzip ; compression will be just as good, if not better, than the column data format, which is more associated with a quick request for your data.

+1
source

All Articles