The guys who wrote Winword and its HTML creation are smart guys. If it were easy to use HTML functions in a purist way, they would.
A word about creating paper-optimized layouts. it supports concepts such as tabs and layered numbering, which HTML does not support or is just starting. As a result, the HTML version of a Word document is not "good" HTML, but an attempt to preserve the functions of a Word document accurately.
When Word reopens the HTML file that it saved, it does some tricky reverse engineering of the document so that in Word it looks very similar to how it started. Similarly, if you embed HTML as a snippet in a web page while retaining Word CSS, the results are pretty correct. In this case, there is a culture clash between the basic CSS page of the web page and Word CSS, and some effort is required to get the best performance. HTML Word does not use UTF-8, which requires some processing.
HTMLTidy can be used to tear up Word's markup, but after that more massaging is required for good rendering on the web page. I have been working on a product for 15 years that does this mixing Word and web pages, and the results can be quite good if you fine-tune CSS.
We used Word because we create paper versions and import text from reports written in Word, and not because we could not find the highlighted HTML editor.
I would not recommend using Word to create neat purist HTML. You wouldn’t use an opener to open a bottle of wine, would you?
Life will be much simpler if: a) Microsoft has redesigned many options for its very confusing "bullet and number" function, b) HTML has provided built-in and properly recognized multi-level numbering support instead of the current approaches. HTML weakness in this area can be seen in the fake numbering options available in Google Docs.
So much has improved with HTML 5, maybe we can hope that HTML 6 will help hide the separation of the text / HTML editor.
Herc
source share