Wait for the window to reload when scrolling a webpage in VBA

I wrote a VBA macro to count the (approximate) number of images returned for a specific search on Google. By approximate values, I mean that the program should count the number of returned images, scroll down to load even more (if applicable) up to a maximum of 400 samples. Here's the (simplified) code:

Sub GoogleCount () ''' '[Code to construct the URL ('fullUrl')] ''' Set objIE = New InternetExplorer objIE.navigate fullUrl Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop Set currPage = objIE.document 'Count images returned newNum = currPage.getElementById("rg_s").getElementsByTagName("IMG").Length 'Scroll down until count = 400 (max) or no change in value Do While newNum >= 100 And newNum < 400 And newNum <> oldNum oldNum = newNum currPage.parentWindow.scrollBy 0, 100000 Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop newNum = currPage.getElementById("rg_s").getElementsByTagName("IMG").Length Loop ''' '[Code to paste the value of newNum into my workbook, and do some other progress reporting] ''' End Sub 

I am not happy with scrolling, it feels very โ€œmanualโ€, especially when you scroll through a fixed value (any moment makes it dynamic, that is, it finds the end of the page and scrolls there).

But the main problem is that it doesnโ€™t work: when I execute the code, it considers that the first 100 (or less) images are excellent. But when it has to scroll and count a few more, I get a value of 100. Slowly stepping over the code from F8, I get the correct numbers (max 400), which leads me to conclude that the code is working too fast (maybe I'm wrong).

To slow down the code, I tried to add the objIE.readyState check objIE.readyState , but since I only scroll, I don't think it counts as a page overload, so the loop is inefficient while waiting for a new image to load.

I thought about adding a time delay. I already use

 Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long) 

elsewhere on the sheet - so I could add as little delay as a millisecond.

But I really want to avoid using this, as this code works for c. 50 different searches and takes enough time to complete already, adding fixed delays that are long enough to maintain a slow connection speed would not be ideal. In addition, the Internet speed changes so much that the fixed delay is very unreliable - I could do some kind of connection test to get the best figure in the ball park, but the best option, obviously, is just to wait until you necessary.

Or better yet, find another way to count the images, preferably one that doesn't require reloading the page 4 times! Any ideas?

NB. If you want to debug yourself, a good image search for installing fullUrl could be https://www.google.com/search?q=stack overflow|exchange&tbm=isch&source=lnt&tbs=isz:ex,iszw:312,iszh:390 , since it returns> 100 images, but less than 400 so you can check all aspects of the code

0
html css vba excel-vba excel
source share
2 answers

In further research, I came up with this approach:

 Dim myDiv As HTMLDivElement: Set myDiv = currPage.getElementById("fbar") Dim elemRect As IHTMLRect: Set elemRect = myDiv.getBoundingClientRect Do Until elemRect.bottom > 0 currPage.parentWindow.scrollBy 0, 10000 Set elemRect = myDiv.getBoundingClientRect Loop myDiv.ScrollIntoView 

Where currPage is the HTML web page ( Dim currPage As HTMLDocument ) and myDiv is a special element. The type is not important, but it should be noted that myDiv always at the bottom of the document and is loaded only after everything else has been. Thus, for Google images, this help panel, which you can only access after scrolling through the entire image, appears.

How it works

The code works as follows: myDiv.getBoundingClientRect is a way to check if an element is visible in the browser - so we need to look at the element at the bottom of the page, as if we were scrolling until it became visible, then everything else should be loaded too.

This, of course, is where Do Until...Loop comes from; we loop until the value of elemRect.bottom is zero (because when the element is not in view, it is zero as soon as it becomes a non-zero number in the view). See here for more details.

Finally, use myDiv.ScrollIntoView to get the browser in the lower right; this is necessary because the BoundingClientRect displayed a little before the item is on the screen, so we need to scroll the last bit to load the final images.

Why not just use the ScrollIntoView start? This does not work because the item is not yet loaded.

0
source share

Just do it instead, I'm sure you can find a more convenient way to do it (if you think it's worth the time), but this should be fine:

 newNum = -1 Set objIE = New InternetExplorer objIE.navigate fullUrl Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop Set currPage = objIE.document Do Until oldNum = newNum oldNum = newNum newNum = currPage.getElementById("rg_s").getElementsByClassName("rg_di rg_bx rg_el ivg-i").Length Application.Wait Now + TimeSerial(0, 0, 2) currPage.parentWindow.scrollBy 0, 100000 Application.Wait Now + TimeSerial(0, 0, 2) If newNum > 400 Then newNum = 400 Loop 

Then you just need to adapt the delay in TimeSerial depending on how fast your computer boots up (here I set up to 2 seconds)

0
source share

All Articles