Unexpected behavior when loading PhantomJS with multiple pages

I have a script (below) that crosses a website with a three-step process. It works great when set to a maximum of 1 page at a time. however, when I increase it to 2 at a time, everything starts to get awkward. onFinished fires sooner than I would expect, and the page is not yet fully loaded. because of this, the rest of my script breaks. any idea why this might happen? I must add that I am using the latest version (1.5).

MAX_PAGES = 1 ### changing MAX_PAGES to >1 causes some pages onFinished event to fire before the page is fully rendered. this is evident by the fact that there are >1 images for some pages. i havent been able to reproduce using microsoft.com, but on some pages i was working on the first onLoadFinished seemed to be called before the page was actually fully loaded based on the look of the rendered images ### newPage = (id) -> context = {} context.id = id context.step = 0 context.page = require('webpage').create() context.page.onLoadStarted = -> context.step++ context.page.onLoadFinished = (status) -> console.log status if status is 'success' context.page.render("#{context.id}_#{context.step}.png") else context.page.release() context.page.open('http://www.microsoft.com') console.log 'started loading' newPage id for id in [1..MAX_PAGES] 
+5
source share
2 answers

I think the problem is that every web page in PhantomJS uses the same QNetworkAccessManager, so the finished () signal is triggered when every web page object finishes loading. You may need to make changes to the PhantomJS code to fix this problem. I noticed this before when I try to load multiple pages in parallel in PhantomJS. The application I'm working on uses QtWebkit and loads multiple pages at the same time, so I have to make sure that each web page gets its own QNetworkAccessManager, so that the ready () signals do not interfere with each other.

+4
source

To crawl multiple pages, see the follow.js example that is associated with the library. https://github.com/ariya/phantomjs/blob/master/examples/follow.js

You need to use recursion to wait for the current page to load before the next page loads.

+3
source

All Articles