How to handle multiple xpaths at once (based on the feed structure) or create your own channels with the same structure

Question

How to handle multiple xpaths at once (based on the feed structure) or create your own channels with the same structure

The code below is verified and works, it prints the contents of a feed having this structure.

<rss> <channel> <item> <pubDate/> <title/> <description/> <link/> <author/> </item> </channel> </rss>

What I was unable to complete was to print feeds that follow this structure below (difference in <feed><entry><published> ), although I changed xpath to /feed//entry . you can see the structure on the page source.

 <feed> <entry> <published/> <title/> <description/> <link/> <author/> </entry> </feed>

I have to say that the code sorts all the item based on its pubDate . In the second structure, I think that it should sort all entry based on published .

I am probably mistaken in xPath which I cannot find. However, if at the end of this I manage to print this rule, how can I change the code to process different structures at the same time?

Is there any service that allows me to create and host my own channels based on these channels, so I will have the same structure for everyone? I hope I made it clear ... Thanks.

 <?php $feeds = array(); // Get all feed entries $entries = array(); foreach ($feeds as $feed) { $xml = simplexml_load_file($feed); $entries = array_merge($entries, $xml->xpath('')); } ?>

+4

xml php xpath xslt

Enexoonoma Jun 09 '11 at 18:24

source share

3 answers

This question really consists of two questions: "How to handle several xpath at once" and "[How to create your own channels with the same structure."

On the second, Dimitry Novachev answered brilliantly. If you want to "merge" or convert one or more XML documents, this is definitely what I recommend.

Meanwhile, I will take a simple path and consider the first question: "How to handle multiple xpaths at once." Easy, there is an operator for this: | . If you want to request all nodes that match /feed//entry or /rss//item , you can use /feed//entry | /rss//item /feed//entry | /rss//item .

+3

Josh davis Jun 18 '11 at 21:55

source share

Here are the solutions.

The problem is that many RSS or Atom feeds have namespaces that do not play well with SimpleXML. In the example below, I use str_replace to replace xmlns= with ns= . Then I use the name of the root element to determine the type of feed (be it RSS or Atom).

The array_push call array_push care of adding all the entries to the $entries array, which you can then use later.

 $entries = array(); foreach ( $feeds as $feed ) { $xml = simplexml_load_string(str_replace('xmlns=', 'ns=', $feed)); switch ( strtolower($xml->getName()) ) { // Atom case 'feed': array_push($entries, $xml->xpath('/feed//entry')); break; // RSS case 'rss': array_push($entries, $xml->xpath('/rss//item')); break; } // Unset the namespace variable. unset($namespaces); } var_dump($entries);

Another solution would be to use Google Reader to combine all your feeds and use this feed instead of all your individual ones.

+1

Francois deschenes Jun 18 '11 at 21:27

source share

Dimitre novatchev · Accepted Answer · 2011-06-18T21:26:47+0000

The main contribution of this answer is the solution (at the end), which can be used with an infinite number of formats , simply by specifying all the alternative "entry" names in the external (global) parameter $postElements and all the "published" alternative names in the external (global) parameter $pub-dateElements .

Other than that , here's how to specify an XPath expression that selects all elements /rss//item and all /feed//entry .

In the simple case of the two possible document formats, this (as @Josh Davis suggested) the Xpath expression works correctly:

 /rss//item | /feed//entry

A more general XPath expression allows you to select the desired elements from a set of unlimited number of document formats :

 /*[contains($topElements, concat('|',name(),'|'))] //*[contains($postElements, concat('|',name(),'|'))]

where the variable $topElements should be replaced by a line divided by the lines of all possible names for the top element, and $postElements should be replaced by a line with the channel marking of all possible names for the "entry" element. We also allow the "input" elements to be at different depths in different document formats.

In particular, for this particular case, the XPath expression will be:

 /*[contains('|feed|rss|', concat('|',name(),'|'))] //*[contains('|item|entry|', concat('|',name(),'|'))]

The rest of this post shows how the complete desired processing can be completely done in XSLT - easily and with elegance.

I. Gentle introduction

This kind of processing is quick and easy with XSLT :

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="/"> <myFeed> <xsl:apply-templates/> </myFeed> </xsl:template> <xsl:template match="channel|feed"> <xsl:apply-templates select="*"> <xsl:sort select="pubDate|published" order="descending"/> </xsl:apply-templates> </xsl:template> <xsl:template match="item|entry"> <post> <xsl:apply-templates mode="identity"/> </post> </xsl:template> <xsl:template match="pubDate|published" mode="identity"> <publicationDate> <xsl:apply-templates/> </publicationDate> </xsl:template> <xsl:template match="node()|@*" mode="identity"> <xsl:copy> <xsl:apply-templates select="node()|@*" mode="identity"/> </xsl:copy> </xsl:template> </xsl:stylesheet>

when this conversion is applied to this XML document (in format 1):

 <rss> <channel> <item> <pubDate>2011-06-05</pubDate> <title>Title1</title> <description>Description1</description> <link>Link1</link> <author>Author1</author> </item> <item> <pubDate>2011-06-06</pubDate> <title>Title2</title> <description>Description2</description> <link>Link2</link> <author>Author2</author> </item> <item> <pubDate>2011-06-07</pubDate> <title>Title3</title> <description>Description3</description> <link>Link3</link> <author>Author3</author> </item> </channel> </rss>

and when it applies to this equivalent document (in format 2):

 <feed> <entry> <published>2011-06-05</published> <title>Title1</title> <description>Description1</description> <link>Link1</link> <author>Author1</author> </entry> <entry> <published>2011-06-06</published> <title>Title2</title> <description>Description2</description> <link>Link2</link> <author>Author2</author> </entry> <entry> <published>2011-06-07</published> <title>Title3</title> <description>Description3</description> <link>Link3</link> <author>Author3</author> </entry> </feed>

in both cases the same desired, correct result is obtained :

 <myFeed> <post> <publicationDate>2011-06-07</publicationDate> <title>Title3</title> <description>Description3</description> <link>Link3</link> <author>Author3</author> </post> <post> <publicationDate>2011-06-06</publicationDate> <title>Title2</title> <description>Description2</description> <link>Link2</link> <author>Author2</author> </post> <post> <publicationDate>2011-06-05</publicationDate> <title>Title1</title> <description>Description1</description> <link>Link1</link> <author>Author1</author> </post> </myFeed>

II. Complete solution

This can be generalized to a parameterized solution :

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:param name="postElements" select= "'|entry|item|'"/> <xsl:param name="pub-dateElements" select= "'|published|pubDate|'"/> <xsl:template match="node()|@*" name="identity"> <xsl:copy> <xsl:apply-templates select="node()|@*" mode="identity"/> </xsl:copy> </xsl:template> <xsl:template match="/"> <myFeed> <xsl:apply-templates select= "//*[contains($postElements, concat('|',name(),'|'))]"> <xsl:sort order="descending" select= "*[contains($pub-dateElements, concat('|',name(),'|'))]"/> </xsl:apply-templates> </myFeed> </xsl:template> <xsl:template match="*"> <xsl:choose> <xsl:when test= "contains($postElements, concat('|',name(),'|'))"> <post> <xsl:apply-templates/> </post> </xsl:when> <xsl:when test= "contains($pub-dateElements, concat('|',name(),'|'))"> <publicationDate> <xsl:apply-templates/> </publicationDate> </xsl:when> <xsl:otherwise> <xsl:call-template name="identity"/> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>

This conversion can be used with an infinite number of formats by simply specifying all the alternative "entry" names in the external (global) parameter $postElements and all the "published" alternative names in the external (global) parameter $pub-dateElements .

Anyone can try this conversion to make sure that when applied to the two XML documents listed above, it again produces the same, desired and correct result.

How to handle multiple xpaths at once (based on the feed structure) or create your own channels with the same structure

More articles: