How to do post-formatting to convert DOC / DOCX to HTML?

I am currently using OpenOffice (<command line>) and JODConvertor to convert Word documents (both .doc and .docx) to HTML for the web application I host. It works fine, except for one problem - HTML files are not formatted properly in terms of fields. Worse, fields are incompatible between operating systems (MacOS and Windows) and browsers.

Is there another tool that does post formatting (I think it involves rewriting CSS of a converted HTML document), like Google Docs?

I'm not trying to be another Google Doc, I just want to imitate their post-formation process (more precisely, formatting the field width), so I can be able to upload and store HTML documents on my own service. I need this to be an automatic process independent of third-party sites (I know that Google has an API called googlecl, but it requires authentication and you become dependent on your servers and services, not to mention what you have there is a quota).

If anyone knows of any other method other than the OpenOffice route, I am open to suggestions.

+4
source share
1 answer

It seems best to add a JODConverter function that allows you to embed your own CSS during export. Something like the following for all pages:

body { margin: 50px !important; } 

Either convince the JODConverter artist, or grab the code and crack it yourself. Good luck.

0
source

All Articles