Should I write Polyglot HTML5 documents?

I am considering converting my current HTML5 documents into an HTML5 polyglot. I believe that even if they ever get only text/html , additional XML spellchecks will help keep my coding habits in order and confidence.

Is there anything particularly exciting in the HTML5 space that would make this an unreasonable choice?

Secondly, the specifications are a bit vague how to check a polyglot document. I assume the basics are:

  • When using the W3C Validator as HTML5, no errors occur.
  • No errors occurred when starting the XML parser

But are there any other rules that I am missing?

Thirdly, seeing that it is a polyglot, does anyone know any warnings for serving it as application/xhtml+xml for supporting browsers and text/html for non-essentials?

Edit: After a little experiment, I found that entities such as   are broken down into XHTML5 (without DTD). This XML parser is a slightly double-edged sword, I think I already answered my third question.

+7
html html5 xhtml polyglot-markup
source share
6 answers

Work on determining how to create HTML5 polyglot documents is ongoing, but see http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html for an early draft This is certainly possible, but it requires a lot of coding discipline, and you will need to decide whether to do it. Although I create polyglot HTML4.01 / XHTML1.0 documents, I create them using an XML toolchain that guarantees XML validity and has specialized code to ensure compatibility with HTML non-void elements and valid XML characters. Direct coding will be very difficult.

One known current issue in HTML5 is the srcdoc attribute for an iframe element. Since the attribute value contains markup, some characters must be escaped. The HTML5 project specification describes how to do this for HTML serialization, but not (the last time I looked) how to do this in XHTML serialization.

+5
source share

I'm late for the party, but after 5 years the issue is still relevant. On the one hand, all my tags appeal to me very much. For people reading this, for ease of editing, for Great Justice. OTOH, looking at the details of the gogy polyglot specification - http://www.sitepoint.com/have-you-considered-polyglot-markup/ has a convenient summary at the end - it is clear to me that I can’t get everything right.

https://developer.mozilla.org/en/docs/Writing_JavaScript_for_XHTML also sheds interesting light on why XHTML failed: the choice of using the mime XML type has various side effects at runtime. By now, this should be a chore for good JS code to handle these (for example, always lowercase tags before comparing), but I don't want all this. There are enough problems with the cross browser to check them as is, thanks.

So, I think there is a useful middle way:

  • So far they only serve as text/html . No need to worry about what it will actually parse as exactly the same DOM with the same runtime in HTML and XML modes.

  • Just try to parse it as some well-formed XML. It helps readers, helps editors, and allows you to use an XML parser in my own documents.

    Unfortunately, polyhedron tools are rarely non-existent - it’s hard to even serialize XML back so that it also complies with HTML ...

    • No problem: always close the void tags ( <hr/> ) and separately close the non-void tags ( <script ...></script> ).

    • No problem: use lowercase tags and attr (with the exception of some SVGs, but external content uses XML rules anyway), always quote attribute values, always provide attribute values ​​( selected="selected" more verbose than stanalone selected , but I can live with it).

    • Inline <script> and <style> most annoying. I cannot use & or < inside without breaking XML parsing. I need:

       <script>/*<![CDATA[*/ foo < bar && bar < baz; /*]]>*/</script> 

    ... and about that! Without worrying about XML namespaces or DOM matching for tables, DOM for tables omits about half of the rules :-)

  • Wait for some future, when I can go directly to the creation of XHTML, skipping the poly density. The advantages are that I can forget the limitations of closing tags, they can directly consume and produce them using XML tools. Of course, neglecting the xml namespaces and other things, it will now be more difficult to switch, but I think that in the future I will create new documents, and not convert existing ones.

    In fact, I'm not quite sure what prevents me from living in this future right now. Is it just IE 8? I'm also a little concerned about handling all-or-nothing errors. I really hope that the future HTML specification will find a way to smooth the spaces of HTML and XML, for example. so that browsers accept <hr></hr> and <script .../> in HTML, while maintaining HTML error handling.

    In addition, tools. The presence of libraries in many languages ​​that can be serialized for marking up a polyglot would make it possible to create programs for their creation. Using tools to validate and convert HTML5 ↔ polyglot ↔ XHTML5 will help. Otherwise, it is pretty much doomed.

+4
source share

Do you have to? Yes. But first, clarification for a couple of points.

Sending the Content-Type: application/xhtml+xml header means it has to go through the XML parser, it still has all the advantages of HTML5, as far as I can tell.
About &nbsp; which is not defined in XML, the only character that references XML is lt, gt, apos, quot and amp, you will need to use numeric character references for anything else. Code for nbsp: &#xa0; or &#160; , I personally prefer hex because Unicode code codes are represented in this way (U + 00A0).

Submitting the header is useful for testing because you can quickly find problems with your markup, such as unclosed tags, tags with private tags, text that can be interpreted as a tag, etc., basically things that can disrupt the appearance or even the functionality of your site.
The most important thing, in my opinion, is that you allow the user to enter data and do not understand this, which usually means that you do not avoid their data and leave yourself vulnerable. Parsed as HTML, you may never notice a problem until someone starts to enter scripts to bother your users or steal data.

This page is pretty good at explaining what polyglot markup is: https://blog.whatwg.org/xhtml5-in-a-nutshell

+1
source share

That sounds very complicated. One of the drawbacks of XHTML was that it was not able to cope successfully between competing XML requirements and old-time HTML.

I think that if you write HTML5 and validate it successfully, you will have the same accurate and valid document as any other.

0
source share

Given that the W3C documentation on the differences between HTML and XHTML is not yet complete, you probably should not waste time trying polyglot. Not yet ... give him a couple more years.

In any case, only in extremely narrow conditions, when you are actively planning to parse HTML-code in XML for a specific purpose, you should invest extra time in accordance with XML. There are no advantages to doing this solely for consumption by web browsers - only disadvantages.

0
source share

This wiki has some information not contained in the W3C document: http://wiki.whatwg.org/wiki/HTML_vs._XHTML

0
source share

All Articles