Combining multiple sequences of list items using XSL transform (XML to XML)

I have a long (about 3 thousand lines) XML document that usually looks like:

<chapter someAttributes="someValues"> <title>someTitle</title> <p>multiple paragraphs</p> <p>...</p> <li> <p>- some text</p> </li> <li> <p>- some other text</p> </li> <!-- another li elements --> <p>multiple other paragraphs</p> <p>...</p> <li> <p>1. some text</p> </li> <li> <p>2. some other text</p> </li> <!-- another li elements --> <p>multiple other paragraphs</p> <p>...</p> <!-- there are other elements such as table, illustration, ul etc. --> </chapter> 

I want to pack every scattered (I mean between paragraphs, tables, illustrations, etc.) sequence of li elements with ol or ul depending on some semantic and returned wrapped XML.

  • if the first character in the paragraph is - , then it should be ul with the mark="DASH" attribute
  • if paragraphs start with 1. , 2. , 3. , etc., then I want ol with numeration="ARABIC"

For example (this is just one sequence):

 <ul mark="DASH"> <li> <p> some text</p> </li> <li> <p> some other text</p> </li> <ul> 

As you can see, I need to cut out the β€œcharacter (s)” from all paragraphs, that is - or 1. , 2. , 3. , etc.

This XML input is more complex than I described (nested sequences, internal sequences in table elements), but I'm looking for some idea, especially how to catch and process a specific sequence using such semantics.

I want to get XML with exactly the same ordering, only with li elements wrapped. XSLT 2.0 / EXSLT are available if necessary.

+4
source share
2 answers

Here is the XSLT 2.0 style sheet:

 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output indent="yes"/> <xsl:template match="@* | node()"> <xsl:copy> <xsl:apply-templates select="@*, node()"/> </xsl:copy> </xsl:template> <xsl:template match="chapter"> <xsl:copy> <xsl:for-each-group select="*" group-adjacent="boolean(self::li)"> <xsl:choose> <xsl:when test="current-grouping-key() and ./p[1][starts-with(., '-')]"> <ul mark="DASH"> <xsl:apply-templates select="current-group()"/> </ul> </xsl:when> <xsl:when test="current-grouping-key() and ./p[1][matches(., '[0-9]\.')]"> <ol numeration="arabic"> <xsl:apply-templates select="current-group()"/> </ol> </xsl:when> <xsl:otherwise> <xsl:copy-of select="current-group()"/> </xsl:otherwise> </xsl:choose> </xsl:for-each-group> </xsl:copy> </xsl:template> <xsl:template match="li/p/text()[1]"> <xsl:value-of select="replace(., '^(-|[0-9]\.)', '')"/> </xsl:template> </xsl:stylesheet> 

When I use Saxon 9.3 with this stylesheet and sample input

 <chapter someAttributes="someValues"> <title>someTitle</title> <p>multiple paragraphs</p> <p>...</p> <li> <p>- some text</p> </li> <li> <p>- some other text</p> </li> <!-- another li elements --> <p>multiple other paragraphs</p> <p>...</p> <li> <p>1. some text</p> </li> <li> <p>2. some other text</p> </li> <!-- another li elements --> <p>multiple other paragraphs</p> <p>...</p> <!-- there are other elements such as table, illustration, ul etc. --> </chapter> 

I get the following output:

 <?xml version="1.0" encoding="UTF-8"?> <chapter> <title>someTitle</title> <p>multiple paragraphs</p> <p>...</p> <ul mark="DASH"> <li> <p> some text</p> </li> <li> <p> some other text</p> </li> </ul> <p>multiple other paragraphs</p> <p>...</p> <ol numeration="arabic"> <li> <p> some text</p> </li> <li> <p> some other text</p> </li> </ol> <p>multiple other paragraphs</p> <p>...</p> </chapter> 
+3
source

Here's a fully functional solution without a procedural approach like xsl:for-each-group and xsl:if .

XSLT 2.0 is tested in Saxon-B 9.0.0.1J

 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output indent="yes" method="html"/> <xsl:strip-space elements="*"/> <!-- identity --> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> <!-- override dash list elements --> <xsl:template match="li[(name(preceding-sibling::*[position()=1]) != name(current())) and matches(.,'^-')]"> <ul mark="DASH"> <li><xsl:apply-templates/></li> <!-- apply recursive template for adjacent nodes --> <xsl:apply-templates select="following-sibling::*[1][name() =name(current())]" mode="next"/> </ul> </xsl:template> <!-- override numeration list elements --> <xsl:template match="li[(name(preceding-sibling::*[position()=1]) != name(current())) and matches(.,'^[0-9]\.')]"> <ol numeration="ARABIC"> <li><xsl:apply-templates/></li> <xsl:apply-templates select="following-sibling::*[1][name() =name(current())]" mode="next"/> </ol> </xsl:template> <!-- recursive template for adjacent nodes --> <xsl:template match="*" mode="next"> <li><xsl:apply-templates/></li> <xsl:apply-templates select="following-sibling::*[1][name() =name(current())]" mode="next"/> </xsl:template> <!-- remove marks/numeration from first text node --> <xsl:template match="li/p/text()[1]"> <xsl:value-of select="replace(., '^(-|[0-9]\.)\s+', '')"/> </xsl:template> </xsl:stylesheet> 

Applies to your input:

 <chapter someAttributes="someValues"> <title>someTitle</title> <p>multiple paragraphs</p> <p>...</p> <ul mark="DASH"> <li> <p>some text</p> </li> <li> <p>some other text</p> </li> </ul> <!-- another li elements --> <p>multiple other paragraphs</p> <p>...</p> <ol numeration="ARABIC"> <li> <p>some text</p> </li> <li> <p>some other text</p> </li> </ol> <!-- another li elements --> <p>multiple other paragraphs</p> <p>...</p> <!-- there are other elements such as table, illustration, ul etc. --> </chapter> 
+1
source

All Articles