<...">

Filter XML based on child nodes

I have an XML file similar to this (with removing more nodes and parts):

<?xml version="1.0" encoding="utf-8"?> <Message xmlns="http://www.theia.org.uk/ILR/2011-12/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Header> <CollectionDetails> <Collection>ILR</Collection> <Year>1112</Year> <FilePreparationDate>2011-10-06</FilePreparationDate> </CollectionDetails> <Source> <ProtectiveMarking>PROTECT-PRIVATE</ProtectiveMarking> </Source> </Header> <SourceFiles> <SourceFile> <SourceFileName>A10004705001112004401.ER</SourceFileName> <FilePreparationDate>2011-10-05</FilePreparationDate> </SourceFile> </SourceFiles> <LearningProvider> <UKPRN>10004705</UKPRN> <UPIN>107949</UPIN> </LearningProvider> <Learner> <ULN>4682272097</ULN> <GivenNames>Peter</GivenNames> <LearningDelivery> <LearnAimRef>60000776</LearnAimRef> </LearningDelivery> <LearningDelivery> <LearnAimRef>ZPROG001</LearnAimRef> </LearningDelivery> </Learner> <Learner> <ULN>3072094321</ULN> <GivenNames>Thomas</GivenNames> <LearningDelivery> <LearnAimRef>10055320</LearnAimRef> </LearningDelivery> <LearningDelivery> <LearnAimRef>10002856</LearnAimRef> </LearningDelivery> <LearningDelivery> <LearnAimRef>1000287X</LearnAimRef> </LearningDelivery> </Learner> </Message> 

I need to filter this so that only Learner entries that have the LearningDelivery LearnAimRef child from ZPROG001 show that the output in this case will be the first student, but not the second:

 <?xml version="1.0" encoding="utf-8"?> <Message xmlns="http://www.theia.org.uk/ILR/2011-12/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Header> <CollectionDetails> <Collection>ILR</Collection> <Year>1112</Year> <FilePreparationDate>2011-10-06</FilePreparationDate> </CollectionDetails> <Source> <ProtectiveMarking>PROTECT-PRIVATE</ProtectiveMarking> </Source> </Header> <SourceFiles> <SourceFile> <SourceFileName>A10004705001112004401.ER</SourceFileName> <FilePreparationDate>2011-10-05</FilePreparationDate> </SourceFile> </SourceFiles> <LearningProvider> <UKPRN>10004705</UKPRN> <UPIN>107949</UPIN> </LearningProvider> <Learner> <ULN>4682272097</ULN> <GivenNames>Peter</GivenNames> <LearningDelivery> <LearnAimRef>60000776</LearnAimRef> </LearningDelivery> <LearningDelivery> <LearnAimRef>ZPROG001</LearnAimRef> </LearningDelivery> </Learner> </Message> 

I looked at how to do this, and I believe that the right way to do this is to use the XSL transform to process the xml and output it as needed to a new file (this is done in C #). After a couple of hours, trying to wrap my head around the XSLT syntax, I am still stuck and cannot get the result I want. Any help is greatly appreciated.

+4
source share
2 answers

To copy most of the original XML document by modifying only certain parts, you will want to start by transforming your identity. It just copies everything. Then add a template to override the identification template for <Learner> elements that you do not want to copy:

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:theia="http://www.theia.org.uk/ILR/2011-12/1"> <!-- identity template --> <xsl:template match="@* | node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <!-- override the above template for certain Learner elements; output nothing. --> <xsl:template match="theia:Learner[ not(theia:LearningDelivery/theia:LearnAimRef = 'ZPROG001')]"> </xsl:template> </xsl:stylesheet> 

(borrowing the namespace prefix from @andyb).

+4
source

If you just want all the <Learner> elements to have descendants (in this case LearnAimRef) with a specific value, you can use the predicate expression (a bit between [ and ] ) to filter the node set.

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:theia="http://www.theia.org.uk/ILR/2011-12/1"> <xsl:template match="/theia:Message"> <xsl:copy-of select="theia:Learner[theia:LearningDelivery/theia:LearnAimRef='ZPROG001']"/> </xsl:template> </xsl:stylesheet> 

So, copy-of reads how to copy all Learner nodes that have a LearningDelivery child that has a LearnAimRef child that has a value equal to ZPROG001

Your XML document has a default namespace http://www.theia.org.uk/ILR/2011-12/1 "so that XPath selects the node correctly, it must use the same namespace declaration, so in in the above XSLT, I assigned your namespace to an alias and used it in XPath.

If you want other parts of the XML source code to be copied to the output tree, you could add additional rules, for example <xsl:copy-of select="theia:LearningProvider"/>

This is not the answer for applying the conversion in C #, however the answer has already been given - How to apply the XSLT stylesheet in C #

Hope this helps :)

+1
source

All Articles