How to find all elements containing the word "download" using Selenium x-path?

I use Selenium to create web paging, and now I want to find all the elements that the user can click on and that contain the word “download” (in any capitalization) in the link text, button text, id element, class or href element. This may include links, buttons, or any other element.

In this answer, I found an xpath for someone looking for an xpath to search for buttons based on specific text (or case insensitive and partial matches):

 text = 'download' driver.find_elements_by_xpath("(//*[contains(text(), 'download')]") 

but to this page , which does not return any results, even if there is the following link:

 <a id="downloadTop" class="navlink" href="javascript:__doPostBack('downloadTop','')">Download</a> 

Does anyone know how I can find all elements that somehow contain the word “upload” on a website?

[EDIT] This question has been marked as a duplicate of a question that gets an answer that suggests changing it to "//*[text()[contains(.,'download')]]" . "//*[text()[contains(.,'download')]]" So I tried the following:

 >>> from selenium import webdriver >>> d = webdriver.Firefox() >>> link = 'https://www.yourticketprovider.nl/LiveContent/tickets.aspx?x=492449&y=8687&px=92AD8EAA22C9223FBCA3102EE0AE2899510C03E398A8A08A222AFDACEBFF8BA95D656F01FB04A1437669EC46E93AB5776A33951830BBA97DD94DB1729BF42D76&rand=a17cafc7-26fe-42d9-a61a-894b43a28046&utm_source=PurchaseSuccess&utm_medium=Email&utm_campaign=SystemMails' >>> d.get(link) >>> d.find_elements_by_xpath("//*[text()[contains(.,'download')]]") [] # As you can see it still doesn't get any results.. >>> 

Does anyone know how I can get all the elements that the user can click on and that contain the word “load” in the link text, button text, id element, class element or in href ? All tips are welcome!

+6
source share
8 answers

Try :

 //*[(@id|@class|@href|text()) [contains(translate(.,'DOWNLOAD','download'), 'download')]] 

This Xpath 1.0 expression selects : all elements that have an id or class attribute or href or text-node child whose string value contains the string "download: in any capitalization."

Here is the running evidence . The XSLT transform below is used to evaluate an XPath expression and copy all selected nodes to the output:

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/"> <xsl:copy-of select= "//*[(@id|@class|@href|text()) [contains(translate(.,'DOWNLOAD','download'), 'download')]] "/> </xsl:template> </xsl:stylesheet> 

When we apply the transform to the following test document :

 <html> <a id="downloadTop" class="navlink" href="javascript:__doPostBack('downloadTop','')">Download</a> <b id="y" class="x_downLoad"/> <p>Nothing to do_wnLoad</p> <a class="m" href="www.DownLoad.com">Get it!</a> <b>dOwnlOad</b> </html> 

Selected items are selected and then copied to the output :

 <a id="downloadTop" class="navlink" href="javascript:__doPostBack('downloadTop','')">Download</a> <b id="y" class="x_downLoad"/> <a class="m" href="www.DownLoad.com">Get it!</a> <b>dOwnlOad</b> 
+3
source

Since you need a case-insensitive match, and XPath 1.0 does not support it, you will need to use the translate() function . Plus, since you need a wildcard, you need to use contains() . And since you also want to check the id , class and href attributes, as well as the text:

 from selenium import webdriver driver = webdriver.Firefox() driver.get("https://www.yourticketprovider.nl/LiveContent/tickets.aspx?x=492449&y=8687&px=92AD8EAA22C9223FBCA3102EE0AE2899510C03E398A8A08A222AFDACEBFF8BA95D656F01FB04A1437669EC46E93AB5776A33951830BBA97DD94DB1729BF42D76&rand=a17cafc7-26fe-42d9-a61a-894b43a28046&utm_source=PurchaseSuccess&utm_medium=Email&utm_campaign=SystemMails") condition = "contains(translate(%s, 'DOWNLOAD', 'download'), 'download')" things_to_check = ["text()", "@class", "@id", "@href"] conditions = " or ".join(condition % thing for thing in things_to_check) for elm in driver.find_elements_by_xpath("//*[%s]" % conditions): print(elm.text) 

Here we basically create an expression by formatting and concatenating strings, making case-insensitive checks for the text() , class , id and href attributes and connecting the conditions to or .

+3
source

Well, the answer you found already tells you how to do what you want. The problem I see is that text = 'download' starts in lowercase and the text in <a id="downloadTop" class="navlink" href="javascript:__doPostBack('downloadTop','')">Download</a> begins with an uppercase.

Start by changing the text to text = 'download' and see if it finds your item now. If this was a problem, you can use a little trick like

 text = 'ownload' 

driver.find_elements_by_xpath("(//*[contains(text(), '" + text + "')] | //*[@value='" + text + "'])")

ignore the first character.

EDIT: Yes, you can make case insensitive.

 driver.find_elements_by_xpath("(//*[contains(translate(text(), 'DOWNLOAD', 'download'), 'download')])") 
+1
source

You can use the translation function as shown below, for any words this is not sensitive:

driver.find_elements_by_xpath("//*[translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'download']")

 >>> driver.find_elements_by_xpath("//*[translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'download']") [<selenium.webdriver.remote.webelement.WebElement (session="0b07fcba-86ee-3945-a0ae-85619e97ca31", element="{4278753b-8b59-bf45-ae3b-f60f40aed071}")>, <selenium.webdriver.remote.webelement.WebElement (session="0b07fcba-86ee-3945-a0ae-85619e97ca31", element="{8aed425c-063e-7846-915d-d8948219cc12}")>] 
0
source

If you want more generalization of xpath and don’t want to use this translate function, you can use itertools.product and generate the entire version of the download string as a node text-attribute, as shown below.

 from itertools import product from selenium import webdriver driver = webdriver.Firefox() driver.get("https://www.yourticketprovider.nl/LiveContent/tickets.aspx?x=492449&y=8687&px=92AD8EAA22C9223FBCA3102EE0AE2899510C03E398A8A08A222AFDACEBFF8BA95D656F01FB04A1437669EC46E93AB5776A33951830BBA97DD94DB1729BF42D76&rand=a17cafc7-26fe-42d9-a61a-894b43a28046&utm_source=PurchaseSuccess&utm_medium=Email&utm_campaign=SystemMails") txt = 'Download' # text to be searched #Generate variants of that txt l = [(c, c.lower()) if not c.isdigit() else (c,) for c in txt.upper()] #make tuple of upper and lower of each lettern that string (Download) variants = ["".join(item) for item in product(*l)] # make all variant of the string Download anchors = ["text()", "@class", "@id", "@href"] #node attribute to be searched #Generate xpaths xpaths_or = " or ".join(["contains(%s,'%s')"%(i,j) for i in anchors for j in variants]) xpaths = "//*[%s]" %xpaths_or for download_tag in driver.find_elements_by_xpath(xpaths): print(download_tag.text) driver.quit() 

Output -

 Download Download 

NB isdigit to avoid changing the number of numbers if exists.

0
source

but on this page, which does not return any results, even if there is the following link:

Because of this, there is another text. Take a look:

 Download download 

one letter is in uppercase. Therefore, for this you need to use the case-insensitive xpath:

 driver.find_elements_by_xpath("(//*[contains(lower-case(text()), 'download')]") 

its should work well enough for you

0
source

When using Selenium and searching for web elements, it is better to always search first for “ID” or “Class Name”, since it is more reliable and simpler than using XPath, XPath is usually used when you cannot find your element using the first 2 mentioned methods .

In this case, you have a very clear ID tag in the download element of this website.

Try using this instead:

 downloadButton = driver.find_element_by_id('downloadTop') 

And then you can use this to click on it:

 downloadButton.click() 
-3
source

Well, I don't know selenium very well, but I can offer a solution that will work. You can use regular expressions to parse the entire page source. For example, if you only need elements with attributes that contain the substring "load", use this regular expression:

 <\w*([a-zA-Z]+).*\w+([a-zA-Z]+)="(.*?download.*?)"?\/?> 

Then find all the mats with the re.finditer function, each correspondence object will contain a tag name (group (1)), attribute name (group (2) and attribute value (group (3))

 import re # wd == webdriver for m in re.finditer('<\w*([a-zA-Z]+).*\w+([a-zA-Z]+)="(.*?download.*?)"?\/?>', wd.page_source): tag, attr, val = m.group(1), m.group(2), m.group(3) 

Then you can use wd.find_elements_by_css_selector (or something else) to find all the tags in the structure of the selenium tree:

 wd.find_elements_by_css_selector('{0}[{1}={2}]'.format(tag, attr, val)) 
-3
source

All Articles