Using YQL multi-query & XPath for HTML parsing, how to avoid nested quotes?

The name is more complex than it should be, here is the problem request.

SELECT * FROM query.multi WHERE queries=" SELECT * FROM html WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com' AND xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span'; SELECT * FROM xml WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com'; SELECT * FROM json WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com'; SELECT * FROM xml WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com'; SELECT * FROM json WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'" 

In particular, this line,

 xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span' 

This is problematic due to quoting, I have to put them into three levels, and I have run out of quote symbols. I tried the following options without success:

 //no attribute quoting xpath='//li[@class=listLi]/div[@class=views]/a/span' //try to quote attribute w/ backslash & single quote xpath='//li[@class=\'listLi\']/div[@class=\'views\']/a/span' //try to quote attribute w/ backslash & double quote xpath='//li[@class=\"listLi\"]/div[@class=\"views\"]/a/span' //try to quote attribute with double single quotes, like SQL xpath='//li[@class=''listLi'']/div[@class=''views'']/a/span' //try to quote attribute with double double quotes, like SQL xpath='//li[@class=""listLi""]/div[@class=""views""]/a/span' //try to quote attribute with quote entities xpath='//li[@class="listLi"]/div[@class="views"]/a/span' //try to surround XPath with backslash & double quote xpath=\"//li[@class='listLi']/div[@class='views']/a/span\" //try to surround XPath with double double quote xpath=""//li[@class='listLi']/div[@class='views']/a/span"" 

All without success.

I don’t see much in avoiding XPath strings, but everything I found seems to depend on the use of concat (which will not help, because it is neither "neither" nor "accessible"), nor html entities. the attributes do not throw an error, but fail because it is not the real XPath line that I need.

I don't see anything in YQL docs about how to handle escaping. I know how this is a cross-case, but I hoped that they would have some kind of elusive guide.

+4
source share
2 answers

You need to avoid the character restricting your XPath request with a double backslash ... in other words:

 SELECT * FROM query.multi WHERE queries=" SELECT * FROM html WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com' AND xpath='//li[@class=\\'listLi\\']/div[@class=\\'views\\']/a/span'; SELECT * FROM xml WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com'; SELECT * FROM json WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com'; SELECT * FROM xml WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com'; SELECT * FROM json WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'" 

( try this in the YQL console )

+4
source

I came up with a solution that does not really answer my original question, but solves the problem.

The data.html.cssselect table will accept the CSS selector and parse it in XPath, avoiding the unpleasant problems of escaping.

 SELECT * FROM query.multi WHERE queries=" SELECT * FROM data.html.cssselect WHERE url='http://www.stumbleupon.com/url/http://www.guildwars2.com' AND css='li.listLi div.views a span'; SELECT * FROM xml WHERE url='http://services.digg.com/1.0/endpoint?method=story.getAll&link=http://www.guildwars2.com'; SELECT * FROM json WHERE url='http://api.tweetmeme.com/url_info.json?url=http://www.guildwars2.com'; SELECT * FROM xml WHERE url='http://api.facebook.com/restserver.php?method=links.getStats&urls=http://www.guildwars2.com'; SELECT * FROM json WHERE url='http://www.reddit.com/button_info.json?url=http://www.guildwars2.com'" 
+1
source

Source: https://habr.com/ru/post/1311256/


All Articles