XPath regex line string

In node, a string can contain one or more substrings delimited by single or double quotes. for instance

<node>Some text "and Some" More</node> 

I need to make a string text that is not surrounded by quotation marks, so the result should look like this:

 some text "and Some" more 

I tried two things:

  • with replace : replace('Some text "and Some" More', '"([^"]*)"', '*') this will replace the text in double quotes with *. But how can I omit it? It does not Desired result: replace('Some text "and Some" More', '"([^"]*)"', lower-case('$1'))
  • with tokenize : for $t in tokenize('Some text "and Some" More', '"') return $t . Since my node will not start with", I know that the odd entries will be substrings surrounded by quotation marks. But I do not know how to select and enter only odd entries. tried using position() , but it returns 1 at each iteration.

Thanks for looking at this. Very much appreciated.

+1
source share
3 answers

Here is one XPath 2.0 expression that processes any mix of strings with quotes and without quotes in the desired way - in any order :

  string-join( (for $str in tokenize(replace(., "(.*?)("".*?"")([^""]*)", "|$1|$2|$3|", "x"),"\|") return if(not(contains($str, """"))) then lower-case($str) else $str ), "") 

For a comprehensive test, I evaluate the above expression in the following XML document:

 <node>Some "Text""and Some" More "Text" XXX "Even More"</node> 

The obtained, correct result is obtained :

 some "Text""and Some" more "Text" xxx "Even More" 

XSLT 2.0 validation :

 <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/"> <xsl:sequence select= 'string-join( (for $str in tokenize(replace(., "(.*?)("".*?"")([^""]*)", "|$1|$2|$3|", "x"),"\|") return if(not(contains($str, """"))) then lower-case($str) else $str ), "") '/> </xsl:template> </xsl:stylesheet> 

When this conversion is applied to the above XML document, the XPath expression is evaluated, and the result of this evaluation is copied to the output :

 some "Text""and Some" more "Text" xxx "Even More" 

Finally, the XSLT 2.0 solution is much easier to write and understand:

 <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/*"> <xsl:analyze-string select="." regex='".*?"'> <xsl:non-matching-substring> <xsl:sequence select="lower-case(.)"/> </xsl:non-matching-substring> <xsl:matching-substring><xsl:sequence select="."/></xsl:matching-substring> </xsl:analyze-string> </xsl:template> </xsl:stylesheet> 
+1
source

Phew

If you don’t like it,

concat(translate(substring-before(//node/text(), '"'),'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') ,substring(substring-after(//node/text(), '"'), 1, string-length(substring-after(//node/text(), '"')) - string-length(substring-after(substring-after(//node/text(), '"'), '"')) -1) , translate(substring-after(substring-after(//node/text(), '"'), '"'), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'))

Just replace //node/text() with XPath to deliver the text you want. I just did it for fun, this is not the “cleanest” solution (HA!).

You can do this faster by ensuring that the node put in is the node context, or provide a more direct path to it.

+1
source

In XQuery you can use

 string-join( for $x at $i in tokenize('Some text "and Some" More', '"') return if ($i mod 2 = 1) then lower-case($x) else $x , '"') 

but xpath has only crippled for without.

In XPath 3 you can use! simple map operator (which is kind of like for, except for its sets and position ()):

 string-join( tokenize('Some text "and Some" More', '"') ! if (position() mod 2 = 1) then lower-case(.) else . , '"') 

And finally, in XPath 2, you can iterate over an index and get a substring for each index:

 string-join( for $i in 1 to count(tokenize('Some text "and Some" More', '"')) return if ($i mod 2 = 1) then lower-case(tokenize('Some text "and Some" More', '"')[$i]) else tokenize('Some text "and Some" More', '"')[$i] , '"') 
+1
source

All Articles