Parsing HTML5- * attribute values ​​data with Selenium in Python

I am parsing a JS-generated webpage as follows:

from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Firefox() driver.get('https://www.consumerbarometer.com/en/graph-builder/?question=M1&filter=country:singapore,canada,mexico,brazil,argentina,united_states,bulgaria,austria,belgium,croatia,czech_republic,denmark,estonia,finland,france,germany,greece,hungary,italy,ireland,latvia,lithuania,norway,netherlands,poland,portugal,russia,romania,serbia,slovakia,spain,slovenia,sweden,switzerland,ukraine,united_kingdom,australia,china,israel,hong_kong_sar,japan,korea,new_zealand,malaysia,taiwan,turkey,vietnam') // wait for svg to appear WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.TAG_NAME, 'svg'))) for text in driver.find_elements_by_class_name('bar-text-label'): print(text.text) driver.close() 

Besides getting text from the bar-text-label class, I would also like to get values ​​from the HTML5 data attribute. For example, <rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="76" class="bar"></rect> , and I would like to analyze 76 .

Can this be done in Selenium?

I tried both of the below without success:

 for text in driver.find_elements_by_class_name('bar'): print(data_value.text) for data in driver.find_elements_by_xpath('//*[contains(@data-value)]/@data-value'): print(data.text) 
+5
source share
2 answers

If you have items such as:

 <rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="75" class="bar">bar1</rect> <rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="76" class="bar">bar2</rect> 

You can get the text value and attribute value as follows:

 elements = driver.find_elements_by_class_name('bar') for element in elements: print element.text print element.get_attribute('data-value') 

This produces:

 bar1 75 bar2 76 
+4
source

You mentioned that you tried:

 for text in driver.find_elements_by_class_name('bar'): print(data_value.text) 

Seeing that data_value not defined anywhere will not work. If you did print(text.text) , you should get the text of each element with class bar . (This, in essence, is what you do in your first fragment.)

You also note the following:

 for data in driver.find_elements_by_xpath('//*[contains(@data-value)]/@data-value'): print(data.text) 

This may not work, because the Selenium find_element(s)... functions cannot return anything but elements or lists of elements . You are trying to force it to return an attribute that will not work. XPath usually allows this, but when you use XPath through Selenium, you cannot get anything but the elements.

You could do something Jessamine Smith suggested or:

 results = driver.execute_script(""" var els = document.getElementsByClassName("bar"); var ret = []; for (var i =0, el; (el = els[i]); ++i) { ret.push([el.textContent, el.attributes["data-value"].value]); } return ret; """) for r in results: print(r[0], r[1]) 

It will take one round between your script and browser. The loop and use of .text and .get_attribute() include 2 rounds in one iteration. JavasScript builds a list of pairs of results. Each pair contains the text of the element in the first position and the data-value in the second position.

+2
source

Source: https://habr.com/ru/post/1212632/


All Articles