I think you should check out the article How Browsers Work: Behind the Scenes of Modern Web Browsers . This is a long read, but worth your while. In particular, the Parser HTML section.
While I canโt do an article of justice, perhaps a brief summary will be useful to spend alone until they have time to read and digest this masterpiece. I must admit that I am a beginner in this field, I have very little experience. Having developed for the Internet professionally for about 10 years, the way the browser processes and interprets my code has long been a black box.
HTML, XHTML, CSS or JavaScript - make your choice. Everyone has a grammar as well as a dictionary. Another great example is English. We have grammar rules that we expect from people, books, etc. We also have a dictionary consisting of nouns, verbs, adjectives, etc.
Browsers interpret the document by studying its grammar, as well as their vocabulary. When he encounters objects that he ultimately does not understand, he will let you know (raising exceptions, etc.). We are doing the same thing as we say.
I like StackOverflow, but if I could change one, that would be an absolute violation ...
Notice in the example above how you immediately begin to parse words and the relationships between words. The beginning makes sense: "I like StackOverflow." Then we come to "... if I could change," and we stopped immediately. "Modified" does not belong here. Probably the author was referring to "change." Now the vocabulary is correct, but the grammar is erroneous. A little later, we are faced with "to be," which can also violate the grammar rule, and a little further, we are faced with the word "absolutamente", which is not part of the English dictionary - another mistake.
Think of all this in terms of a DOCTYPE. Right now, I opened the XHTML 1.0 Strict Doctype source on my second monitor. Among its internal elements are the following lines:
<!ENTITY % heading "h1|h2|h3|h4|h5|h6">
Defines header objects. And while I stick to XHTML, I can use any of them in my document ( <h1>Hello World</h1> ). But if I try to do this, say H7 , the browser will stumble upon the dictionary as "foreign" and tell me:
"Row 7, column 8: element" h7 "undefined"
Perhaps when parsing a document, we come across <table . We know that now we are dealing with a table element, which has its own set of dictionaries, such as tbody , tr , etc. As long as we know the language, grammar rules, etc., we know when something is wrong. Returning to XHTML 1.0 Strict Doctype, we find the following:
<!ELEMENT table (caption?, (col*|colgroup*), thead?, tfoot?, (tbody+|tr+))> <!ELEMENT caption %Inline;> <!ELEMENT thead (tr)+> <!ELEMENT tfoot (tr)+> <!ELEMENT tbody (tr)+> <!ELEMENT colgroup (col)*> <!ELEMENT col EMPTY> <!ELEMENT tr (th|td)+> <!ELEMENT th %Flow;> <!ELEMENT td %Flow;>
Given this link, we can save the current check against any source that we will analyze. If the author writes tread , instead of thead , we have a standard by which we can determine that the error. When problems are not resolved, and we cannot find rules for matching certain uses of grammar and vocabulary, we inform the author that their document is invalid.
I do not deal with this scientific justice, but I hope that it will serve - if not more than that - it will be enough for you to find it inside you to sit down and read the article referenced by the beginning of this answer, and perhaps sit down Explore the different DTDs we face every day.