Saving object references when converting XML using XSLT?

How can I save entity references when converting XML using XSLT (2.0)? With all the processors I tried, the entity gets the default permission. I can use xsl:character-map to handle character objects, but what about text objects?

For example, this XML:

 <!DOCTYPE doc [ <!ENTITY so "stackoverflow"> <!ENTITY question "How can I preserve the entity reference when transforming with XSLT??"> ]> <doc> <text>Hello &so;!</text> <text>&question;</text> </doc> 

Converts with the following XSLT:

 <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> </xsl:stylesheet> 

outputs the following result:

 <doc> <text>Hello stackoverflow!</text> <text>How can I preserve the entity reference when transforming with XSLT??</text> </doc> 

The result should look like an input (minus the doctype declaration):

 <doc> <text>Hello &so;!</text> <text>&question;</text> </doc> 

I hope I don’t need to pre-process the input, replacing all ampersands with &amp; (e.g. &amp;question; ), and then process the output, replacing all &amp; on & .

Maybe this is a specific processor? I am using Saxon 9.

Thanks!

+7
source share
5 answers

If you know which entities will be used and how they are defined, you can do the following (quite primitive and error prone, but still better than nothing):

 <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my:my"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:character-map name="mapEntities"> <xsl:output-character character="&amp;" string="&amp;"/> </xsl:character-map> <xsl:variable name="vEntities" select= "'stackoverflow', 'How can I preserve the entity reference when transforming with XSLT\?\?' "/> <xsl:variable name="vReplacements" select= "'&amp;so;', '&amp;question;'"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <xsl:template match="/"> <xsl:text disable-output-escaping="yes"><![CDATA[<!DOCTYPE doc [ <!ENTITY so "stackoverflow"> <!ENTITY question "How can I preserve the entity reference when transforming with XSLT??"> ]> ]]> </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="text()"> <xsl:value-of select= "my:multiReplace(., $vEntities, $vReplacements, count($vEntities) ) " disable-output-escaping="yes"/> </xsl:template> <xsl:function name="my:multiReplace"> <xsl:param name="pText" as="xs:string"/> <xsl:param name="pEnts" as="xs:string*"/> <xsl:param name="pReps" as="xs:string*"/> <xsl:param name="pCount" as="xs:integer"/> <xsl:sequence select= "if($pCount > 0) then my:multiReplace(replace($pText, $pEnts[1], $pReps[1] ), subsequence($pEnts,2), subsequence($pReps,2), $pCount -1 ) else $pText "/> </xsl:function> </xsl:stylesheet> 

when applied to the provided XML document :

 <!DOCTYPE doc [ <!ENTITY so "stackoverflow"> <!ENTITY question "How can I preserve the entity reference when transforming with XSLT??"> ]> <doc> <text>Hello &so;!</text> <text>&question;</text> </doc> 

The desired result is obtained :

 <!DOCTYPE doc [ <!ENTITY so "stackoverflow"> <!ENTITY question "How can I preserve the entity reference when transforming with XSLT??"> ]> <doc> <text>Hello &so;!</text> <text>&question;</text> </doc> 

Please note :

  • Special (RegEx) characters in substitutions must be escaped.

  • We needed to solve DOE, which is not recommended, because it violates the principles of the XSLT architecture and processing models - in other words, this solution is an unpleasant hack.

+4
source

This can be especially problematic if you are using something like the S1000D. It uses @boardno entities and attributes to snap to numbers. This is a return to its SGML roots.

Since this is an automatic behavior that extends behavior that is correct but not desirable, I often have to return to tools like sed, awk, and batch scripts to manage specific data analysis tasks when using the S1000D as input.

IMHO, this would be a great suggestion for changing one of the upcoming XSLT specifications that a compatible processor accepts a run-time parameter that can enable or disable extension extensions.

+3
source

If you use the Java implementation of the XSLT 2.0 processor (for example, Saxon 9 Java), you may need to check if http://andrewjwelch.com/lexev/ helps, you can preprocess your XML using entity and character references this way so that they are marked as XML elements, which can then be converted as needed.

+1
source

I use this solution and it works well:

 <xsl:variable name="prolog" select="substring-before(unparsed-text(document-uri(.)),'&lt;root')"/> <xsl:template match="/"> <xsl:value-of select="$prolog" disable-output-escaping="yes"/> <xsl:apply-templates/> </xsl:template> 
+1
source

You can save EntityReference nodes in a document using the DOM LS parser with the "entity" parameter set to true. http://docs.oracle.com/javase/6/docs/api/org/w3c/dom/DOMConfiguration.html

The specification states that the default value is true, but depending on the parser, it may be false, remember this.

To download Xerces:

 DOMImplementationLS domImpl = new org.apache.xerces.dom.CoreDOMImplementationImpl(); 

You can use the registry as shown below, but personnaly, I would rather hard code the implementation, as I said above:

 DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementationLS domImpl = (DOMImplementationLS) registry.getDOMImplementation("XML 3.0 LS 3.0"); 

Then, to download the document:

 // XML parser with XSD schema LSParser parser = domImpl.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema"); DOMConfiguration config = parser.getDomConfig(); config.setParameter("entities", true); LSInput input = impl.createLSInput(); Document lDoc = parser.parse(your XML stream); 

Then your XML objects are not extended in the DOM.

Then, since SAXON does not process objects that have not been extended ("Unsupported node enter DOM error! 5"), you cannot use net.sf.saxon.xpath.XPathFactoryImpl , you must set XPathFactory Xerces by default with XPathFactory.newInstance ( )

0
source

All Articles