Writing pandas DataFrame in Excel with different formats for different columns

Question

Writing pandas DataFrame in Excel with different formats for different columns

I am trying to write a pandas DataFrame to a .xlsx file, where different numeric columns will have different formats. For example, some of them will show only two decimal places, some of them will not be displayed, some will be formatted as percentages with the symbol "%", etc.

I noticed that DataFrame.to_html() has a formatters parameter that allows you to do just that by comparing different formats with different columns. However, there is no similar parameter in the DataFrame.to_excel() method. At most, we have a float_format that is global for all numbers.

I read a lot of SO posts that are at least partially related to my question, for example:

Use the older openpyxl mechanism to apply formats one cell at a time . This is the approach with which I have been most successful. But this means that the writing loops apply cell formats, memorizing offsets, etc.
Show percentages by changing the table data itself to rows . Passing the route of changing the actual data prompted me to try to use decimal formatting by calling round() for each column before writing to Excel - this also works, but I would like to avoid changing the data.
Sorting others, mainly about date formats.

Are there even more convenient functions / properties related to Excel in the pandas API that can help here or something similar in openpyxl , or perhaps somehow point the output format metadata directly to each column in the DataFrame , which then will be interpreted downstream with different output?

+5

python pandas excel openpyxl

sparc_spread Apr 30 '15 at 18:00

source share

2 answers

As you rightly point out, applying formats to individual cells is extremely inefficient.

openpyxl 2.4 includes native support for Pandas Dataframes and named styles.

https://openpyxl.readthedocs.io/en/latest/changes.html#id7

+3

Charlie clark May 01, '15 at 17:01

source share

jmcnamara · Accepted Answer · 2015-05-01T09:42:01+0000

You can do this with Pandas 0.16 and the XlsxWriter engine by accessing the main workbooks and worksheet pages:

 import pandas as pd # Create a Pandas dataframe from some data. df = pd.DataFrame(zip( [1010, 2020, 3030, 2020, 1515, 3030, 4545], [.1, .2, .33, .25, .5, .75, .45], [.1, .2, .33, .25, .5, .75, .45], )) # Create a Pandas Excel writer using XlsxWriter as the engine. writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter') df.to_excel(writer, sheet_name='Sheet1') # Get the xlsxwriter objects from the dataframe writer object. workbook = writer.book worksheet = writer.sheets['Sheet1'] # Add some cell formats. format1 = workbook.add_format({'num_format': '#,##0.00'}) format2 = workbook.add_format({'num_format': '0%'}) format3 = workbook.add_format({'num_format': 'h:mm:ss AM/PM'}) # Set the column width and format. worksheet.set_column('B:B', 18, format1) # Set the format but not the column width. worksheet.set_column('C:C', None, format2) worksheet.set_column('D:D', 16, format3) # Close the Pandas Excel writer and output the Excel file. writer.save()

Conclusion:

See also Working with Python Pandas and XlsxWriter .

Writing pandas DataFrame in Excel with different formats for different columns

More articles: