XPath streamlines search for priority attributes

Question

XPath streamlines search for priority attributes

I want to write XPath that can return some link elements in an HTML DOM.

The syntax is incorrect, but here is the gist of what I want:

//web:link[@text='Login' THEN_TRY @href='login.php' THEN_TRY @index=0]

THEN_TRY is the created statement because I cannot find which operator to use. If a page contains multiple links for a given set of pairs [attribute = name], the link that matches the most recent attributes (s) should be returned instead of others.

For example, consider the case where the XPath example above finds 3 links that match any of the specified attributes:

 link A: text='Sign In', href='Login.php', index=0 link B: text='Login', href='Signin.php', index=15 link C: text='Login', href='Login.php', index=22

The C link is considered the best match since it matches the First and Second attributes.

Link B takes second place because it matches only the first attribute.

Link A takes last place because it does not match the first attribute; it matches only the Second and Third attributes.

XPath should return the best match, Link C.

If more than one link was linked for "best fit," XPath should return the first best link found on the page.

+4

search xpath xslt attributes

user94000 Apr 21 '09 at 21:01

source share

4 answers

There is a brute force solution. I will demonstrate two attributes instead of three.

  (
   // web: link [@text! = 'Login' and @href! = 'Login.php'
              and not (// web: link [@text = 'Login' or @href = 'Login.php'])]
 |  // web: link [@text! = 'Login' and @href = 'Login.php'
              and not (// web: link [@text = 'Login'])]
 |  // web: link [@text = 'Login' and @href! = 'Login.php'
              and not (// web: link [@text = 'Login' and @href = 'Login.php'])]
 |  // web: link [@text = 'Login' and @href = 'Login.php']
 )[1]

That is, select all links where no attribute matches, but only if there is no link that matches better. Then select all links with a smaller attribute attribute, but only when there are no links with excellent attribute matching. Select links where only the first attribute matches, but only if there are no links in which both attributes match. Then select the links where both attributes match. Only one of these four conjuncts will be nonempty, so the | operator never combines anything. Finally, select the first link in document order if any of these node-sets has more than one element.

The reason I used only two attributes instead of three is because I did not want to enter all eight cases. You can omit the first case if you are not interested in any links, if at least one of the attributes does not match.

This is a situation where you might be better off simply selecting all the candidates for the much simpler query shown by Jeff , and then using different code to rank the results after, where you can more easily use iteration and variables to select the best candidate.

If you can use XPath 2 , you can use the comma operator (or the concat function ) to join the node sequences (which replace node-sets). Try this for example:

  (
   // web: link [@text = 'Login' and @href = 'Login.php' and @index = 0]
 , // web: link [@text = 'Login' and @href = 'Login.php' and @index! = 0]
 , // web: link [@text = 'Login' and @href! = 'Login.php' and @index = 0]
 , // web: link [@text = 'Login' and @href! = 'Login.php' and @index! = 0]
 , // web: link [@text! = 'Login' and @href = 'Login.php' and @index = 0]
 , // web: link [@text! = 'Login' and @href = 'Login.php' and @index! = 0]
 , // web: link [@text! = 'Login' and @href! = 'Login.php' and @index = 0]
 , // web: link [@text! = 'Login' and @href! = 'Login.php' and @index! = 0]
 )[1]

Aside, here's an easy way to rank each link, which makes them pretty simple. Imagine a bit field, one bit for each attribute that you want to test. If the first attribute matches, set the left-most bit, otherwise leave it open. If the second attribute matches, set the next most significant bit, etc. So, for your example, you get the following bit values:

  011 link A: text = 'Sign In', href = 'Login.php', index = 0
 100 link B: text = 'Login', href = 'Signin.php', index = 15
 110 link C: text = 'Login', href = 'Login.php', index = 22

To select the best match, treat the bit fields as binary numbers. Link A has a rating of 3, link B has a rating of 4, and link C has a 6. (This is a bit like CSS selectors ). This is a way to model ordering criteria, but now that I’ve typed all this, I don’t quite understand that this leads to any short solution in XPath.

+2

Rob kennedy Apr 21 '09 at 10:06

source share

Try executing or , as in:

 web:link[@text='Login' or @href='login.php' or @index=0]

However, this will probably give you all of these nodes, not just one of these priorities.

Update
So, I tried this and it works. It is long, but it should do what you need (with appropriate changes for your circuit).

 //link[@text='Login'] | //link[not(//link[@text='Login']) and @href='Login.php'] | //link[not(//link[@text='Login']) and not(//link[@href='Login.php']) and @index='0']

I ran it on the following test XML, commenting out each line to test different parts, and it works as expected.

 <?xml version="1.0" encoding="utf-8"?> <Test> <link text='Sign In' href='Login2.php' index="0"></link> <link text='Login' href='Signin.php' index="15"></link> <link text='LoginBlah' href='Login.php' index="22"></link> </Test>

Update 2
I notice that I have not completely solved the problem, since you need a better match, not a match in order of priority. This can be done, but it takes a fairly long XPath, which makes the equivalent of each combination in order. I do not know any other way to simplify it.

+1

Jeff yates Apr 21 '09 at 21:08

source share

Today I had a similar problem and came up with a solution that will work in the context of XSLT. For a clean XPath solution, you will need one of the other approaches.

 <xsl:variable name="first" select="/web:link[@text='Login']"/> <xsl:variable name="second" select="/web:link[@href='login.php']"/> <xsl:variable name="third" select="/web:link[@index=0]"/> <xsl:variable name="theAnswer" select="$first | $second[not($first)] | $third[not($first or $second)]"/>

Of course, the trick here is that the empty node set is false.

0

Dominic Cronin Apr 22 '09 at 13:21

source share

Dimitre novatchev · Accepted Answer · 2009-04-22T03:14:48+0000

The previous two answers seem inaccurate.

Here is one possible solution :

You want to find the first node with the maximum value for the following function:

 100*number(@text='Login') +10*number(@href='Login.php') + 1*number(@index=0)

In XPath 2.0, this can be expressed as a single XPath expression as follows:

  /*/link[ 100*number(@text='Login') +10*number(@href='Login.php') + 1*number(@index=0) eq max(/*/link /(100*number(@text='Login') +10*number(@href='Login.php') + 1*number(@index='0') ) ) ]

In XPath 1.0, constructing such an expression with one extension would be extremely difficult , if possible at all, and even if possible, such an XPath expression would be impossible to understand, prove, and / or service.

However, choosing the most appropriate link element is possible in any language that hosts XPath 1.0.

The following is an example with XSLT 1.0 as the hosting language:

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/"> <xsl:for-each select="*/link"> <xsl:sort data-type="number" order="descending" select= "100*(@text='Login') +10*(@href='Login.php') + 1*(@index=0) "/> <xsl:if test="position() = 1"> <xsl:copy-of select="."/> </xsl:if> </xsl:for-each> </xsl:template> </xsl:stylesheet>

when the above conversion is applied to this XML document :

 <links> <link name="A" text="Sign in" href="Login.php" index="0"/> <link name="B" text="Login" href="SignIn.php" index="15"/> <link name="C" text="Login" href="Login.php" index="22"/> </links>

the correct result is created :

 <link name="C" text="Login" href="Login.php" index="22" />

It reminds me of another solution, "The only XPath expression that finds the best match." I decided about seven years ago :)

XPath streamlines search for priority attributes

More articles: