Using pandas Combining / Combining Two Different Excel Files / Sheets

I am trying to merge 2 different Excel files. (thanks to the message Import multiple excel files into python pandas and merge them into a single data frame )

The one I have worked out so far:

import os import pandas as pd df = pd.DataFrame() for f in ['c:\\file1.xls', 'c:\\ file2.xls']: data = pd.read_excel(f, 'Sheet1') df = df.append(data) df.to_excel("c:\\all.xls") 

Here is how they look.

enter image description here

However, I want to:

  • Exclude the last lines of each file (i.e. row4 and row5 in File1.xls; row7 and row8 in File2.xls).
  • Add a column (or overwrite column A) to indicate where the data is from.

For example:

enter image description here

Is it possible? Thanks.

+7
python pandas excel
source share
2 answers

For a number. 1, you can specify skip_footer as described here ; or alternatively do

 data = data.iloc[:-2] 

after reading the data.

For a number. 2, you can:

 from os.path import basename data.index = [basename(f)] * len(data) 

Also, it might be better to place all the data frames in a list, and then concat them at the end; something like:

 df = [] for f in ['c:\\file1.xls', 'c:\\ file2.xls']: data = pd.read_excel(f, 'Sheet1').iloc[:-2] data.index = [os.path.basename(f)] * len(data) df.append(data) df = pd.concat(df) 
+9
source share
 import os import os.path import xlrd import xlsxwriter file_name = input("Decide the destination file name in DOUBLE QUOTES: ") merged_file_name = file_name + ".xlsx" dest_book = xlsxwriter.Workbook(merged_file_name) dest_sheet_1 = dest_book.add_worksheet() dest_row = 1 temp = 0 path = input("Enter the path in DOUBLE QUOTES: ") for root,dirs,files in os.walk(path): files = [ _ for _ in files if _.endswith('.xlsx') ] for xlsfile in files: print ("File in mentioned folder is: " + xlsfile) temp_book = xlrd.open_workbook(os.path.join(root,xlsfile)) temp_sheet = temp_book.sheet_by_index(0) if temp == 0: for col_index in range(temp_sheet.ncols): str = temp_sheet.cell_value(0, col_index) dest_sheet_1.write(0, col_index, str) temp = temp + 1 for row_index in range(1, temp_sheet.nrows): for col_index in range(temp_sheet.ncols): str = temp_sheet.cell_value(row_index, col_index) dest_sheet_1.write(dest_row, col_index, str) dest_row = dest_row + 1 dest_book.close() book = xlrd.open_workbook(merged_file_name) sheet = book.sheet_by_index(0) print "number of rows in destination file are: ", sheet.nrows print "number of columns in destination file are: ", sheet.ncols 
+2
source share

All Articles