Writing pandas DataFrame in Excel with different formats for different columns

I am trying to write a pandas DataFrame to a .xlsx file, where different numeric columns will have different formats. For example, some of them will show only two decimal places, some of them will not be displayed, some will be formatted as percentages with the symbol "%", etc.

I noticed that DataFrame.to_html() has a formatters parameter that allows you to do just that by comparing different formats with different columns. However, there is no similar parameter in the DataFrame.to_excel() method. At most, we have a float_format that is global for all numbers.

I read a lot of SO posts that are at least partially related to my question, for example:

Are there even more convenient functions / properties related to Excel in the pandas API that can help here or something similar in openpyxl , or perhaps somehow point the output format metadata directly to each column in the DataFrame , which then will be interpreted downstream with different output?

+5
source share
2 answers

You can do this with Pandas 0.16 and the XlsxWriter engine by accessing the main workbooks and worksheet pages:

 import pandas as pd # Create a Pandas dataframe from some data. df = pd.DataFrame(zip( [1010, 2020, 3030, 2020, 1515, 3030, 4545], [.1, .2, .33, .25, .5, .75, .45], [.1, .2, .33, .25, .5, .75, .45], )) # Create a Pandas Excel writer using XlsxWriter as the engine. writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter') df.to_excel(writer, sheet_name='Sheet1') # Get the xlsxwriter objects from the dataframe writer object. workbook = writer.book worksheet = writer.sheets['Sheet1'] # Add some cell formats. format1 = workbook.add_format({'num_format': '#,##0.00'}) format2 = workbook.add_format({'num_format': '0%'}) format3 = workbook.add_format({'num_format': 'h:mm:ss AM/PM'}) # Set the column width and format. worksheet.set_column('B:B', 18, format1) # Set the format but not the column width. worksheet.set_column('C:C', None, format2) worksheet.set_column('D:D', 16, format3) # Close the Pandas Excel writer and output the Excel file. writer.save() 

Conclusion:

enter image description here

See also Working with Python Pandas and XlsxWriter .

+9
source

As you rightly point out, applying formats to individual cells is extremely inefficient.

openpyxl 2.4 includes native support for Pandas Dataframes and named styles.

https://openpyxl.readthedocs.io/en/latest/changes.html#id7

+3
source

All Articles