What are PCDATA and CDATA actually?

it seems that the free definition of PCDATA and CDATA is that

  • PCDATA is character data, but should be analyzed.
  • CDATA is character data and is not processed.

but then someone told me that CDATA actually understands or PCDATA doesn't actually understand ... so this is a bit of a mess. Does anyone know what a real deal is?

Update . I actually added the definition of PCDATA to Wikipedia ... so don't take this answer too seriously, as this is just my rough understanding of this.

+28
html xml xhtml cdata pcdata
May 13 '09 at
source share
6 answers

From WIKI:

PCDATA

Simply put, PCDATA stands for Parsed Character Data. This means that characters must be parsed by XML, XHTML, or HTML. ( &lt; will be changed to <, <p> will be considered a paragraph paragraph, etc.). Compare this to CDATA, where characters are not processed using an XML, XHTML, or HTML parser.

CDATA

The term CDATA, i.e. character data, is used for various but related purposes in the SGML and XML markup languages. The term means that a certain part of the document is a common character data, and not asymmetric data or character data with a more specific limited structure.

+23
May 13, '09 at 13:21
source share

Both PCDATA and CDATA are analyzed. They are character data.

Both should include only valid characters. For example, if your document encoding is UTF-8, the contents of the CDATA sections should remain the valid UTF-8 character. Therefore, random binary data is likely to interfere with the correct execution of the document. Also, CDATA sections are still parsed, if only to find the tag at the end of the section. But other markup-like characters, such as <,> and, are ignored and passed as is using the parser.

OTOH in PCDATA litteral <and (and "or" in attribute values) must be escaped, or they will be interpreted as markup. Objects will also be expanded.

So yes, CDATA sections are really parsed. I'm not sure why you were told that PCDATA is not parsed.

+9
May 14 '09 at 10:59
source share

PCDATA - parsed character data

CDATA Character Data - (Unparsed)

http://www.w3schools.com/XML/xml_cdata.asp

+6
May 13, '09 at 13:20
source share
  • PCDATA is the text that will be parsed. Tags within the text will be treated as markup, and entities will be expanded.
  • CDATA is text that is not parsed by the parser. Tags within the text will not be treated as markup, and entities will not expand.

By default, all this is PCDATA. In the following example, the root is ignored, it will be analyzed, and it will have no content except one child.

 <?xml version="1.0"?> <foo> <bar><test>content!</test></bar> </foo> 

When we want to indicate that the element will contain only text, and not children, we use the PCDATA keyword because this keyword indicates that the element should contain the character data to be analyzed - that is, any text other than characters than (<) , more than (>), ampersand (&), quote (') and double quotation mark (").

In the following example, the panel is CDATA and is not parsed and contains the content "content!".

 <?xml version="1.0"?> <foo> <bar><![CDATA[<test>content!</test>]]></bar> </foo> 

There are several content models in SGML. The #PCDATA content model says an item can contain plain text. The β€œdeveloped” part means that the markup (including PI, comments, and SGML directives) in it is parsed instead of being displayed as source text. It also means that entity references are replaced.

Another type of content model that allows plain text content is CDATA. In XML, the element content model cannot be implicitly installed on CDATA, but in SGML this means that markup and entity references are ignored in the element contents. However, attributes of type CDATA replace entity references.

In XML, #PCDATA is the only text content model. You use it if you want to allow text content in the element at all. The CDATA content model can be used explicitly through the layout of the CDATA block in #PCDATA, but the contents of the element cannot be defined as CDATA by default.

In DTD, the type of attribute that contains the text must be CDATA. The CDATA keyword in an attribute declaration has a different meaning than the CDATA section in an XML document. In the CDATA section, all characters are legal (including <,>, &, and "characters"), except for the tag "]]>" end.

#PCDATA is not suitable for attribute type. It is used for the text type of the sheet.

#PCDATA is added by the hashtag just for historical reasons.

+3
Jul 24 '12 at 5:26
source share

Your first definition is correct.

PCDATA is parsed, which means entities expand and this text is treated as markup. CDATA is not parsed by the XML parser.

0
May 13 '09 at
source share

If only CDATA elements were installed in the XDTML DTD by default, this would save a lot of ugly manual overrides ... Why do script blocks contain other elements? If there are such elements, they are processed by the JS interpreter in DOM manipulation actions, in which case they should be completely ignored by the XML parser before inserting and rendering the document. I believe that this could be designed to force the use of external script resource files, which is ultimately good.

0
Mar 12 '13 at 9:38 on
source share



All Articles