Selenium Webdriver Python page does not load fully / sometimes freezes when updated

I am clearing a site with a lot of javascript that is generated when the page is called. As a result, the traditional methods of web cleaning (beautifulsoup, ect.) Do not work for my purposes (at least I was not successful for their work, all the important data is in parts of javascript). As a result, I started using selenium webdriver. I need to clear several hundred pages, each of which contains from 10 to 80 data points (each of which contains about 12 fields), so it is important that this script (is that the correct terminology?) Can work for quite a while without I need to look for her.

I have code running on one page, and I have a control section that tells the cleanup section which page is being cleared. The problem is that sometimes the parts of loading the javascript page, and sometimes not when they (~ 1/7), updates fix things, but sometimes the update delays the webdriver and therefore the python runtime like Well. Annoyingly, when it freezes so, the code does not have time. What's happening?

Here is a stripped down version of my code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException, TimeoutException
import time, re, random, csv
from collections import namedtuple

def main(url_full):
driver = webdriver.Firefox()
driver.implicitly_wait(15)
driver.set_page_load_timeout(30)   


#create HealthPlan namedtuple
HealthPlan = namedtuple( "HealthPlan", ("State, County, FamType, Provider, PlanType,      Tier,") +
                       (" Premium, Deductible, OoPM, PrimaryCareVisitCoPay, ER, HospitalStay,") +
                       (" GenericRx, PreferredPrescription, RxOoPM, MedicalDeduct, BrandDrugDeduct"))


#check whether the page has loaded and handle page load and time out errors
pageNotLoaded= bool(True)
while pageNotLoaded:
    try:
        driver.get(url_full)
        time.sleep(6+ abs(random.normalvariate(1.8,3)))
    except TimeoutException:
        driver.quit()
        time.sleep(3+ abs(random.normalvariate(1.8,3)))
        driver.get(url_full)
        time.sleep(6+ abs(random.normalvariate(1.8,3)))
    # Handle page load error by testing presence of showAll, 
    # an important feature of the page, which only appears if everything else loads

    try:
        driver.find_element_by_xpath('//*[@id="showAll"]').text
    # catch NoSuchElementException=>refresh page
    except NoSuchElementException:
        try:
            driver.refresh()

            # catch TimeoutException => quit and load the page 
            # in a new instance of firefox,
            # I don't think the code ever gets here, because it freezes in the refresh
            # and will not throw the timeout exception like I would like
        except TimeoutException:
            driver.quit()
            time.sleep(3+ abs(random.normalvariate(1.8,3)))
            driver.get(url_full)
            time.sleep(6+ abs(random.normalvariate(1.8,3)))

    pageNotLoaded= False

    scrapePage() # this is a dummy function, everything from here down works fine, 

I searched many times for similar problems, and I don’t think that someone else posted about this on this or that other that I looked. I am using python 2.7, selenium 2.39.0, and I am trying to clear Healthcare.gov to get premium score pages

EDIT: (, ) , , ( , , ), , try/except.

EDIT2: , Windows7 64bit, firefox 17 (, , )

+4
1

, time.sleep !

?

time.sleep(3+ abs(random.normalvariate(1.8,3)))

:

class TestPy(unittest.TestCase):

    def waits(self):
        self.implicit_wait = 30

:

(self.)driver.implicitly_wait(10)

:

WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath('some_xpath'))

driver.refresh() :

driver.get(your url)

cookie:

driver.delete_all_cookies()


scrapePage() # this is a dummy function, everything from here down works fine, :

http://scrapy.org

+2

All Articles