I am planning a web service for my own use internally, which takes a single argument, a URL, and returns an html representing the resolved DOM from that URL. By permission, I mean that webservice will first get the page at this URL, and then use PhantomJS to "render" the page, and then return the resulting source after all DHTML, AJAX calls, etc. Will be completed. However, running phantom for each request (which I am doing now) is too slow. I would rather have a pool of PhantomJS instances, one of which is always available to serve the last call to my web service.
Has any work been done on this species before? I would rather base this web service on the work of others than write a pool manager / HTTP proxy for myself from scratch.
More context . I have listed 2 similar projects that I have seen so far below, and why I avoided each of them, which raises the question of managing the instance pool of PhantomJS.
jsdom - from what I saw, it has excellent functionality for executing scripts on the page, but it does not try to replicate the behavior of the browser, so if I used it as a universal "DOM resolver", in the end, there is a lot of extra coding for handling all edge cases, event triggering, etc. In the first example I saw, you need to manually call the onload () function of the body tag for the test application that I installed using node. It seemed the beginning of a deep rabbit hole.
Selenium - it just has a lot more moving parts, so setting up a pool to manage long-lasting browser instances will be harder than using PhantomJS. I do not need any benefits for recording macros / scripts. I just want a web service that is just as efficient at retrieving a web page and resolving its DOM as if I were viewing this URL using a browser (or even faster if I can make it ignore images, etc. d.).
Trindaz Apr 01 2018-12-12T00: 00Z
source share