Automatic HTML Simplification Tool?

Whenever I see a problem that will be shared by others, with a solution that would be interesting to implement, it usually turns out to be already solved. I think it's best to stop and do a search before I dive into the encoding.

Here's the situation: You can copy and paste sections of an office document into the HTML editor of the visual studio. The problem is that it creates HTML text that looks like this:

<tr style="mso-yfti-irow:0;mso-yfti-firstrow:yes"> <td style="border:solid windowtext 1.0pt;mso-border-alt:solid windowtext .5pt; padding:0cm 5.4pt 0cm 5.4pt" valign="top"> <p align="left" class="MsoNormal" style="text-align:left;tab-stops:center 216.0pt right 432.0pt"> <b style="mso-bidi-font-weight:normal"><span lang="EN-US">ID<o:p></o:p></span></b></p> </td> <td style="border:solid windowtext 1.0pt;border-left:none; mso-border-left-alt:solid windowtext .5pt;mso-border-alt:solid windowtext .5pt; padding:0cm 5.4pt 0cm 5.4pt" valign="top"> 

Great for a car, but it's actually not very readable. I'm sure this could be cleared up by finding duplicate styles and creating CSS classes from them. A computer program can do this very simply.

I could run this program, and then I would have beautiful, easy-to-maintain HTML that looks just like my Word document.

(Yes, I know that I can simply edit my Word document and then copy and paste it into HTML or just save it as an HTML file. But this will not be the same as manually editing it after the fact).

Anyway, does anyone know of a program that does this?


(later editing). I found that the question I asked is a duplicate of this .
+4
source share
3 answers

HTML Tidy does it! It also integrates with regular text editors (such as Notepad ++ or UltraEdit) and provides the ability to clear Office markup. You will need to set the word-2000 boolean flag to true

In addition, Jeff Atwood talked about this issue and presented his own C # 2.0 solution in this article .

+6
source

I would try using Tidy HTML: http://tidy.sourceforge.net/ , another option inserts your text document into TinyMCE and then saves your HTML.

+3
source

You might want to seriously consider Paste as Plain Text as your simplification tool. Weigh how long it takes to reapply the markup ... you might find this less painful than you think.

+2
source

All Articles