Import multiple excel files into python pandas and merge them into one data frame

I would like to read several excel files from a directory in pandas and merge them into one big framework. However, I could not figure it out. I need help with a for loop and building a concatenated data frame: Here is what I still have:

import sys import csv import glob import pandas as pd # get data file names path =r'C:\DRO\DCL_rawdata_files\excelfiles' filenames = glob.glob(path + "/*.xlsx") dfs = [] for df in dfs: xl_file = pd.ExcelFile(filenames) df=xl_file.parse('Sheet1') dfs.concat(df, ignore_index=True) 
+16
python pandas excel concatenation
source share
3 answers

As mentioned in the comments, one mistake you make is that you iterate over an empty list.

Here's how I would do it, using the example of having five identical Excel files that are added one by one.

(1) Import:

 import os import pandas as pd 

(2) File List:

 path = os.getcwd() files = os.listdir(path) files 

Output:

 ['.DS_Store', '.ipynb_checkpoints', '.localized', 'Screen Shot 2013-12-28 at 7.15.45 PM.png', 'test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls', 'Untitled0.ipynb', 'Werewolf Modelling', '~$Random Numbers.xlsx'] 

(3) Select the "xls" files:

 files_xls = [f for f in files if f[-3:] == 'xls'] files_xls 

Output:

 ['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls'] 

(4) Initialize an empty framework:

 df = pd.DataFrame() 

(5) List the list of files to add to an empty framework:

 for f in files_xls: data = pd.read_excel(f, 'Sheet1') df = df.append(data) 

(6) Enjoy the new data framework. :-)

 df 

Output:

  Result Sample 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 
+41
source share

this works with Python 2.x

be in the directory where the excel files are located

see http://pbpython.com/excel-file-combine.html

 import numpy as np import pandas as pd import glob all_data = pd.DataFrame() for f in glob.glob("*.xlsx"): df = pd.read_excel(f) all_data = all_data.append(df,ignore_index=True) # now save the data frame writer = pd.ExcelWriter('output.xlsx') all_data.to_excel(writer,'sheet1') writer.save() 
+5
source share
 import pandas as pd import os os.chdir('...') #read first file for column names fdf= pd.read_excel("first_file.xlsx", sheet_name="sheet_name") #create counter to segregate the different file data fdf["counter"]=1 nm= list(fdf) c=2 #read first 1000 files for i in os.listdir(): print(c) if c<1001: if "xlsx" in i: df= pd.read_excel(i, sheet_name="sheet_name") df["counter"]=c if list(df)==nm: fdf=fdf.append(df) c+=1 else: print("headers name not match") else: print("not xlsx") fdf=fdf.reset_index(drop=True) #relax 
0
source share

All Articles