Adding an existing excel sheet with a new data framework using python pandas

I currently have this code. Works great.

It looks at the Excel files in a folder, deletes the first 2 lines, then saves them as separate Excel files, and also saves the files in a loop as an added file.

Currently, the added file overwrites the existing file every time I run the code.

I need to add new data to the end of an existing Excel worksheet ('master_data.xlsx)

dfList = [] path = 'C:\\Test\\TestRawFile' newpath = 'C:\\Path\\To\\New\\Folder' for fn in os.listdir(path): # Absolute file path file = os.path.join(path, fn) if os.path.isfile(file): # Import the excel file and call it xlsx_file xlsx_file = pd.ExcelFile(file) # View the excel files sheet names xlsx_file.sheet_names # Load the xlsx files Data sheet as a dataframe df = xlsx_file.parse('Sheet1',header= None) df_NoHeader = df[2:] data = df_NoHeader # Save individual dataframe data.to_excel(os.path.join(newpath, fn)) dfList.append(data) appended_data = pd.concat(dfList) appended_data.to_excel(os.path.join(newpath, 'master_data.xlsx')) 

I thought it would be a simple task, but I think not. I think I need to add the master_data.xlsx file as a data frame, then map the index to the newly added data and save it back. Or maybe there is an easier way. Any help is appreciated.

+17
source share
2 answers

Helper function to add a DataFrame to an existing Excel file:

 def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, truncate_sheet=False, **to_excel_kwargs): """ Append a DataFrame [df] to existing Excel file [filename] into [sheet_name] Sheet. If [filename] does not exist, then this function will create it. Parameters: filename : File path or existing ExcelWriter (Example: '/path/to/file.xlsx') df : dataframe to save to workbook sheet_name : Name of sheet which will contain DataFrame. (default: 'Sheet1') startrow : upper left cell row to dump data frame. Per default (startrow=None) calculate the last row in the existing DF and write to the next row... truncate_sheet : truncate (remove and recreate) [sheet_name] before writing DataFrame to Excel file to_excel_kwargs : arguments which will be passed to 'DataFrame.to_excel()' [can be dictionary] Returns: None """ from openpyxl import load_workbook import pandas as pd # ignore [engine] parameter if it was passed if 'engine' in to_excel_kwargs: to_excel_kwargs.pop('engine') writer = pd.ExcelWriter(filename, engine='openpyxl') # Python 2.x: define [FileNotFoundError] exception if it does not exist try: FileNotFoundError except NameError: FileNotFoundError = IOError try: # try to open an existing workbook writer.book = load_workbook(filename) # get the last row in the existing Excel sheet # if it was not specified explicitly if startrow is None and sheet_name in writer.book.sheetnames: startrow = writer.book[sheet_name].max_row # truncate sheet if truncate_sheet and sheet_name in writer.book.sheetnames: # index of [sheet_name] sheet idx = writer.book.sheetnames.index(sheet_name) # remove [sheet_name] writer.book.remove(writer.book.worksheets[idx]) # create an empty sheet [sheet_name] using old index writer.book.create_sheet(sheet_name, idx) # copy existing sheets writer.sheets = {ws.title:ws for ws in writer.book.worksheets} except FileNotFoundError: # file does not exist yet, we will create it pass if startrow is None: startrow = 0 # write out the new sheet df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs) # save the workbook writer.save() 

Examples of using...


Old answer: it allows you to write multiple data frames to a new Excel file.

You can use the openpyxl engine in conjunction with the startrow parameter:

 In [48]: writer = pd.ExcelWriter('c:/temp/test.xlsx', engine='openpyxl') In [49]: df.to_excel(writer, index=False) In [50]: df.to_excel(writer, startrow=len(df)+2, index=False) In [51]: writer.save() 

with: /temp/test.xlsx:

enter image description here

PS you can also specify header=None if you do not want to duplicate column names ...

UPDATE: you can also check this solution

+28
source

If you are not strictly looking for an Excel file, get the output as a CSV file and just copy the CSV file to a new Excel file

df.to_csv('filepath', mode='a', index = False, header=None)

mode = 'a'

means to add

This is a roundabout way, but it works neatly!

0
source

All Articles