Styling HTML with Microsoft Word

Ugh. The word is notorious for its bloated, confusing, non-standard, non-semantic HTML. Unfortunately, I have a professor who requires us to draw up a plan for very demanding standards. I would prefer not to write it by hand, so I decided to do what would be useful for my classmates. I created a diagram using a simple numbered list in NeoOffice on my Mac, exported it as HTML, and wrote quite a bit of CSS to style it. Then I managed to create an ordered list in Word for Windows, export it as html and send it to me to check compatibility. After scrolling miles down the page, trying to suppress a shiver, I saw a problem. Word did not use <ol> and <li> . He used mountains of nested <span> with lessons from wazoo. I hate to see that all my work is wasted, but it's impossible to work with this content - I need to stylize a document based on a document, and not with a universal style sheet.

Ideally, Word will generate HTML using standard tags so that I can style it just like any other list, but that doesn't seem to be the case. How can I get it to generate lists that actually use <ul> and <li> rather than <span> , or at least modify something in my code to somehow work with that weird way create lists?

+7
source share
9 answers

From a number of studies it turns out that the approach of converting a document to HTML is not practical. Word is simply too variable in its approach to saving files and generating HTML for a single document, not to mention the differences between different versions of Word. Like Wyatt's suggestion, there may be ways to clear the code, but none of them are perfect. Digging out the API may provide an opportunity to more easily parse this, but it may turn out to be just as confusing in practice. It seems that using a word as a tool to create a list is simply unrealistic.

0
source

The guys who wrote Winword and its HTML creation are smart guys. If it were easy to use HTML functions in a purist way, they would.

A word about creating paper-optimized layouts. it supports concepts such as tabs and layered numbering, which HTML does not support or is just starting. As a result, the HTML version of a Word document is not "good" HTML, but an attempt to preserve the functions of a Word document accurately.

When Word reopens the HTML file that it saved, it does some tricky reverse engineering of the document so that in Word it looks very similar to how it started. Similarly, if you embed HTML as a snippet in a web page while retaining Word CSS, the results are pretty correct. In this case, there is a culture clash between the basic CSS page of the web page and Word CSS, and some effort is required to get the best performance. HTML Word does not use UTF-8, which requires some processing.

HTMLTidy can be used to tear up Word's markup, but after that more massaging is required for good rendering on the web page. I have been working on a product for 15 years that does this mixing Word and web pages, and the results can be quite good if you fine-tune CSS.

We used Word because we create paper versions and import text from reports written in Word, and not because we could not find the highlighted HTML editor.

I would not recommend using Word to create neat purist HTML. You wouldn’t use an opener to open a bottle of wine, would you?

Life will be much simpler if: a) Microsoft has redesigned many options for its very confusing "bullet and number" function, b) HTML has provided built-in and properly recognized multi-level numbering support instead of the current approaches. HTML weakness in this area can be seen in the fake numbering options available in Google Docs.

So much has improved with HTML 5, maybe we can hope that HTML 6 will help hide the separation of the text / HTML editor.

+3
source

Use this http://word2cleanhtml.com/ resource to convert Word documents to clean HTML. Very useful in my opinion.

+1
source

If you can access a Windows PC, use Notepad ++ (http://notepad-plus-plus.org/) to paste the code, and then select the plugin to format the code.

0
source

Use the WYSIWYG editor as a list generator. This eliminates the need for users to process raw CSS by removing them from the comfort zone of Microsoft Word.

0
source

Creative use of Word Find and Replace may also work. For example, open the HTML file using NotePad, copy and paste the text back into the Word document. Open Find and Replace. If the HTML looks like this (for example), with "This is the first line of text", which is the first position:

 <p class=MsoListParagraphCxSpFirst style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span...(Cut due to berevity)... -height:115%'>This is the first line of text<o:p></o:p></span></p> 

Then find and replace with wildcards to \<p*line-height:115%'\ and replace them with nothing. May require a series of Finds / Replaces. The HTML markup is extensive, but everything else is the same, it is at least consistent.

0
source

If you have Dreamweaver, there is a magic “clear HTML word” button that does wonders in this scenario.

0
source

MSWord is as smart as the author - an ordered list is hidden as such in HTML only if it was created in MSWord as such. This means that the list should be formatted as such in the MSWord constructs, and not how it is displayed on the page. Many people will create lists that “appear” to be ordered or disordered using tabs and other formatting and not using the MSWord list functions. Saving to HTML tries to save it as it was written, not how it was shown.

0
source

You can link an external stylesheet to an HTML document in the Work section of the Developer tab → Document Template → Linked CSS. You can then use this to override almost any style generated by Word.

Credit: https://superuser.com/questions/65107/how-to-apply-external-css-stylesheet-to-document-in-microsoft-word/65144#65144

Note. I used this with Word 2013, but this is not a new feature.

0
source

All Articles