Performance difference in pandas read_table vs. read_csv vs. from_csv vs. read_excel?

I try to import CSV files into pandas, but sometimes I can get data in other formats to make DataFrame objects.

Today I learned about read_table as a β€œcommon” importer for other formats and wondered if there are significant performance differences between the various methods in pandas for reading .csv files, for example. read_table , from_csv , read_excel .

  • Do these other methods have better performance than read_csv ?
  • read_csv different from from_csv for creating a DataFrame ?
+8
performance python pandas csv dataframe
source share
2 answers
  • read_table read_csv with replacing sep=',' with sep='\t' , these are two thin wrappers around the same function, so the performance will be the same. read_excel uses the xlrd package to read xls and xlsx files in a DataFrame, it does not process csv files.
  • from_csv calls read_table , so no.
+19
source share

I found that CSV and tab delimited text (.txt) are equivalent to reading and writing speeds, and they are much faster than reading and writing MS Excel files. However, the Excel format compresses the file size.


For the same 320 MB CSV file (16 MB. Xlsx) (i7-7700k, SSD, Anaconda Python 3.5.3, Pandas 0.19.2 works)

Using the standard import pandas as pd convention

2 seconds to read .csv df = pd.read_csv('foo.csv') (same for pd.read_table)

15.3 seconds to read .xlsx df = pd.read_excel('foo.xlsx')

10.5 seconds to write .csv df.to_csv('bar.csv', index=False) (same for .txt)

34.5 seconds to write .xlsx df.to_excel('bar.xlsx', sheet_name='Sheet1', index=False)


To write your data to tab delimited text files, you can use:

df.to_csv('bar.txt', sep='\t', index=False)

+4
source share

All Articles