Reading an excel file in python using pandas

I am trying to read an excel file as follows:

newFile = pd.ExcelFile(PATH\FileName.xlsx) ParsedData = pd.io.parsers.ExcelFile.parse(newFile) 

which produces an error saying that two arguments are expected, I don’t know what the second argument is, and what I am trying to achieve here is to convert the Excel file to a DataFrame. Am I doing it right? or is there any other way to do this using pandas?

+132
python pandas
Jun 12 '13 at
source share
7 answers

Close: First you call ExcelFile , but then you call the .parse method and pass it the sheet name.

 >>> xl = pd.ExcelFile("dummydata.xlsx") >>> xl.sheet_names [u'Sheet1', u'Sheet2', u'Sheet3'] >>> df = xl.parse("Sheet1") >>> df.head() Tid dummy1 dummy2 dummy3 dummy4 dummy5 \ 0 2006-09-01 00:00:00 0 5.894611 0.605211 3.842871 8.265307 1 2006-09-01 01:00:00 0 5.712107 0.605211 3.416617 8.301360 2 2006-09-01 02:00:00 0 5.105300 0.605211 3.090865 8.335395 3 2006-09-01 03:00:00 0 4.098209 0.605211 3.198452 8.170187 4 2006-09-01 04:00:00 0 3.338196 0.605211 2.970015 7.765058 dummy6 dummy7 dummy8 dummy9 0 0.623354 0 2.579108 2.681728 1 0.554211 0 7.210000 3.028614 2 0.567841 0 6.940000 3.644147 3 0.581470 0 6.630000 4.016155 4 0.595100 0 6.350000 3.974442 

What you are doing is calling a method that lives in the class itself, and not in an instance that is in order (although not very idiomatic), but if you do, you will also need to pass the sheet name

 >>> parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1") >>> parsed.columns Index([u'Tid', u'dummy1', u'dummy2', u'dummy3', u'dummy4', u'dummy5', u'dummy6', u'dummy7', u'dummy8', u'dummy9'], dtype=object) 
+206
Jun 12 '13 at 10:52
source share

This is a very simple and easy way.

 import pandas df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1') # or using sheet index starting 0 df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2) 

read the full documentation documentation http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.read_excel.html

FutureWarning: The sheetname keyword is deprecated for newer versions of Pandas, use sheet_name .

+87
Apr 13 '16 at 6:51
source share

The thought that I should add here is that if you want to access rows or columns to go through them, you do this:

 import pandas as pd # open the file xlsx = pd.ExcelFile(PATH\FileName.xlsx) # get the first sheet as an object sheet1 = xlsx.parse(0) # get the first column as a list you can loop through # where the is 0 in the code below change to the row or column number you want column = sheet1.icol(0).real # get the first row as a list you can loop through row = sheet1.irow(0).real 



Edit:

The icol(i) and irow(i) methods are icol(i) . You can use sheet1.iloc[:,i] to get the ith sheet1.iloc[i,:] and sheet1.iloc[i,:] to get the i-th row.

+19
Oct. 27 '15 at 17:35
source share

I think this should satisfy your needs:

 import pandas as pd # Read the excel sheet to pandas dataframe DataFrame = pd.read_excel("PATH\FileName.xlsx", sheetname=0) 
+12
Nov 30 '16 at 14:10
source share

You just need the pd.read_excel path to your file in pd.read_excel

 import pandas as pd file_path = "./my_excel.xlsx" data_frame = pd.read_excel(file_path) 

Examine the documentation to examine options like skiprows to ignore rows when loading Excel.

0
Mar 30 '19 at 22:48
source share
 import pandas as pd data = pd.read_excel (r'**YourPath**.xlsx') print (data) 
0
Jul 04 '19 at 6:44
source share

Here is an updated method with syntax that is more common in Python code. It also does not allow opening the same file several times.

 import pandas as pd sheet1, sheet2 = None, None with pd.ExcelFile("PATH\FileName.xlsx") as reader: sheet1 = pd.read_excel(reader, sheet_name='Sheet1') sheet2 = pd.read_excel(reader, sheet_name='Sheet2') 

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

0
Aug 22 '19 at 17:43
source share



All Articles