Are CDATA partitions really not needed?

This question is triggered by the rather militant refusal of developer Michael Rees to enable parsing of CDATA sections in FOR XML PATH, because "There is no data that you store in the semantic difference."

I stored HTML nuggets in CDATA nodes and other content that requires the use of special or inconvenient characters. However, I don’t feel qualified to challenge Lynx the controversial statement, because, as I believe, he is technically right in the scenarios where I am busy with CDATA for convenience.

What really bakes my noodles is that when developers go online asking for advice on how to process CDATA segments using FOR XML PATH, respondents constantly send them instead of FOR EXPLICIT instead of the XML rendering method that Rys quotes as " request from hell. "

If we can really do without CDATA in every use case that everyone has to offer, I think we should stop moaning and stop using CDATA in the future. But if there are clearly defined cases where CDATA is significant, Rys has already set it up to bake it in the FOR XML PATH, which goes forward at the very top link in this question.

So what is this? Are CDATA sections really relics of the past? Or will Rice pull out a finger and enable CDATA parsing in FOR XML PATH? And while we are on it, meanwhile, are there any hacks for getting FOR XML PATH for returning CDATA sections?

+4
source share
4 answers

CDATA sections are useful if you do not need the semantics of the data in them (i.e. you do not need to parse it - it's just a run of characters), and you do not want to escape any of the XML inside them.

Definition according to w3 :

CDATA sections may occur in any case where character data may occur; they are used to exit blocks of text containing characters that would otherwise be recognized as markup.

From wikipedia :

New authors of XML documents often misunderstand the purpose of the CDATA section, mistakenly believing that its purpose is to “protect” data from processing as normal character data during processing. Some APIs for working with XML documents offer options for independent access to CDATA sections, but such parameters exist above and above the usual requirements for XML processing systems and still do not change the implicit value of the data. Character data is character data, whether expressed through a CDATA section or regular markup.

CDATA sections are useful for writing XML code as text data in an XML document. For example, if you want to type a book with XSL explaining the use of an XML application, the XML markup that appears in the book itself will be written in the source file in the CDATA section. However, the CDATA section cannot contain the string "]]>", and therefore the CDATA section cannot contain nested CDATA sections. A preferred approach to using CDATA sections to encode text that contains the triad “]]>" is to use multiple CDATA sections by splitting each triad event immediately before ">". For example, to encode "]]>" you could write:

+2
source

CDATA sections are not needed. They are not a "relic of the past" because they have always been unnecessary.

This does not mean that they are not useful. Look at any programming language or library, and you can find a large number of things that you could do, not because they are semantically equivalent to something else, but which are useful if there is a person sitting there who needs to write material.

For that matter, even with software production it would also be convenient to use the opposite approach and use the CDATA sections for each individual fragment of c-data (bloated, but it could increase efficiency elsewhere).

FOR XML PATH does not include the person sitting there to write material. This is a tool for creating valid XML from SQL query results. (This is also not a question of parsing CDATA sections, but creating them is another matter).

And you can't really complain that FOR XML EXPLICIT is an alternative when you want very fine-grained control - the reason why FOR XML EXPLICIT is so unpleasant to use is sometimes because it gives you great control. Indeed, think about whether they first added support for CDATA partitions, and then added support for each other setting and setting, which seemed just as vital to someone else. How long will it take before FOR XML EXPLICIT becomes an automatic choice because it is simpler than FOR XML PATH‽

There are four cases where CDATA is useful:

  • You are sitting at the keyboard, typing this material yourself.
  • You are dealing with mixing different technologies with different standards developed at different times and which will be interpreted by different parsers differently (for example, javascript built into XHTML - although this is not 100% necessary here, this is a nightmare to do otherwise) .
  • You are trying to parse XML with what XML does not understand.
  • You are trying to use something built on a parser that provides access to low-level access between CDATA partitions and other character data and improper use of this low-level access.

Oddly enough, these four cases are also four cases where a ban on the adoption of CDATA partitions may make sense.

Case 1 does not apply here, it is not human-generated code. Case 2 can apply here if you are doing something really crazy. Honestly, the lack of CDATA sections is the least of your worries here; Switch to creating simplified XML in the request and converting it to another location. Case 3 may apply here, but it’s unfair to complain about SQL people, if so, when you have to complain about a broken XML parser that doesn’t consider &lt;example&gt; same as <![CDATA[<example>]]> . Case 4 may apply here, but again complains about the person who wrote the error code, not the SQL people.

+3
source

You are absolutely right, CDATA is necessary in many scenarios, they are part of the XML standard and must be supported by every XML processing tool / method. But the fact is, MS doesn't usually care .. you know, "640kB should be enough for everyone." Edit: About FOR XML EXPLICIT is the best way to create finely formatted XML data. Yes, the syntax is very painful to look at and confuse, but as soon as you use it several times, you will admire its beauty and power.

0
source

It is interesting to see how someone can just give up a very valuable part of the Standard with such a bizarre approach. Not everyone uses XML for several hundred HTML characters or a list of items for a drop-down list.

Some of us actually use XML for data exchange, very complex data such as CCD, CDA CDR, all of these are standard document formats in the healthcare arena and are becoming more visible with ObamaCare. Part of this document structure contains attachments such as DiCOM images, PDFs, and other binary data that should not be read by the analyzer due to CDATA detection.

Why do I have to pay the overhead of an analyzer reading a 3 megabyte DiCom image embedded in a CCD? Why should I be forced to separate a document when it is included in the source data, and is part of the XML standard. And I want to be able to find and recover a document and is content with XML.

It annoys me why you all support parsing data that should not be parsed by the engine. If the engine sees that CDATA is ignoring it, it is very simple. And the ongoing argument that some do not need this does not matter. This is part of the standard and the standard must be supported. If they would like to add "Feature" as it was called, then support the default behavior with the option.

Please stop parsing CDATA and ignore it.

0
source

All Articles