XPath following sister for crawling non-returning brother

I'm trying to create a crawler to extract some attribute data from vendor websites, which I can check based on our internal attribute database, and I'm new to import.io. I watched a bunch of videos, but although my syntax seems correct, my xpath override guide does not return attribute values. I have the following html code example:

<table> <tbody><tr class="oddRow"> <td class="label">&nbsp;Adhesive Type&lrm;</td><td>&nbsp;Epoxy&lrm; </td> </tr> <tr> <td class="label">&nbsp;Applications&lrm;</td><td>&nbsp;Hard Disk Drive Component Assembly&lrm; </td> </tr> <tr class="oddRow"> <td class="label">&nbsp;Brand&lrm;</td><td>&nbsp;Scotch-Weld&lrm; </td> </tr> <tr> <td class="label">&nbsp;Capabilities&lrm;</td><td>&nbsp;Sustainability&lrm; </td> </tr> <tr class="oddRow"> <td class="label">&nbsp;Color&lrm;</td><td>&nbsp;Clear Amber&lrm; </td> 

I am trying to write xpath the following sibling statement to capture the "Color" through the import.io crawler. The xpath code when I select "Color":

 //*[@id="attributeList"]/table/tbody/tr[5]/td[1] 

I tried to use:

 //*[@id="attributeList"]/table/tbody/tr/td[.="Color"]/following-sibling::td 

But it does not capture the value of the color attribute from the table. I'm not sure if this has anything to do with odd and even classes? When I look at html this seems logical; color is "Color", and the attribute value is in the next bracket td.

+7
xpath crawler4j
source share
1 answer

The text in the selected td node contains more than just "Color" . This is &nbsp;Color&lrm; . So instead you can select td nodes, the text contains the string "Color" :

 '//*[@id="attributeList"]/table/tbody/tr/td[contains(text(), "Color")]/following-sibling::td/text()' 
+7
source share

All Articles