Why does the shadow tag </p> create an empty paragraph?
Apparently, if you have a </p> end tag without a corresponding start tag inside the body element, most, if not all browsers will generate an empty paragraph in its place:
<!DOCTYPE html> <title></title> <body> </p> </body> Even if any text exists around the end tag, none of them is part of this p element - it will always be empty, and text nodes will always exist by themselves:
<!DOCTYPE html> <title></title> <body> some text</p>more text </body> If the above body content is wrapped in <p> and </p> tags ... I will leave you guessing what is going on:
<!DOCTYPE html> <title></title> <body> <p>some text</p>more text</p> </body> Interestingly, if the </p> tag is not preceded by the <body> or </body> , all browsers, except for IE9 and older, will not generate an empty paragraph (IE โค 9, on the other hand, will always create one, in while IE10 and above behave the same as all other browsers):
<!DOCTYPE html> <title></title> </p> <!DOCTYPE html> <title></title> </p><body> <!DOCTYPE html> <title></title> </p></body> I canโt find any links that indicate that the end tag without the corresponding start tag should generate an empty element, but this should not be surprising, given that it is not even the correct HTML in the first place. In fact, I found browsers for this using the p element (and to some extent this is the br element!), But not an explanation of why.
It is fairly consistent between browsers using both traditional HTML parsers and HTML5 parsers, although it is used both in quirks mode and in standard mode. Therefore, it is probably fair to infer that this is for backward compatibility with earlier specifications or traditional behavior.
In fact, I found this comment in response to a somewhat related question , which basically confirms this:
The reason the <p> tags are valid, so the <p> was originally defined as a โnew paragraph" marker, not p, which is an element of the container. Equivalent to being a "new line" marker. You can see how this is defined in this document since 1992: http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html , and this one since 1993: http: // www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt Since in the preliminary dates of web pages, changes and browser browsers were always as compatible as possible with existing web content, it was always possible to use <p> like this way.
But that doesnโt quite explain why parsers consider the explicit </p> end </p> (with a slash) as a simple ... tag and generate an empty element in the DOM. Is this part of some parser error handling convention from the beginning when the syntax has not been so strictly defined as it was more recent or something else? If so, is it registered somewhere?
What is required is documented in HTML5. See http://dev.w3.org/html5/spec/tree-construction.html#parsing-main-inbody and search down for An end tag whose tag name is "p" and it says:
If the stack of open items does not have an item in the button pane with the same tag name as the marker, then this is a parsing error; act as if the start tag with the tag name "p" was noticed, then process the current token.
What is translated into English means creating the p element if the </p> cannot be matched with an existing <p> .
Why this is so is more difficult to state. This usually happens because some browsers in the past caused this to happen as an error, and web pages came to rely on behavior, so other browsers also had to implement this.
The HTML4 DTD claims that the end tag is optional for the paragraph element, but that the start tag is required.
The SGML declaration for HTML4 indicates omittag is 'yes', which means you can use the start tag.
The end tag follows the SGML rules:
the end of the tag closes, returns to the corresponding start tag, all unclosed intermediate start tags with missing end tags
Anonymous block blocks are generated for inline elements, such as text nodes, so they should not be wrapped with a paragraph element.
There is a thread in Mozilla's error database that explains this behavior:
Here is the corresponding comment by Boris Zbarsky :
Actually, as I understand it, for the correct analysis of SGML / HTML requires that we do this. That is, the '<' of the next tag is a valid way to close the markup of the previous tag ...
And Ian Hickson sums up:
The main principle of work here, apparently, is that the markup is fixed by delaying any closing tags until all other open elements are closed, and attempts are made to make the DOM follow DTD HTML.
References