I found that CSV and tab delimited text (.txt) are equivalent to reading and writing speeds, and they are much faster than reading and writing MS Excel files. However, the Excel format compresses the file size.
For the same 320 MB CSV file (16 MB. Xlsx) (i7-7700k, SSD, Anaconda Python 3.5.3, Pandas 0.19.2 works)
Using the standard import pandas as pd convention
2 seconds to read .csv df = pd.read_csv('foo.csv') (same for pd.read_table)
15.3 seconds to read .xlsx df = pd.read_excel('foo.xlsx')
10.5 seconds to write .csv df.to_csv('bar.csv', index=False) (same for .txt)
34.5 seconds to write .xlsx df.to_excel('bar.xlsx', sheet_name='Sheet1', index=False)
To write your data to tab delimited text files, you can use:
df.to_csv('bar.txt', sep='\t', index=False)
griffinc
source share