For the company’s project, I need to create an application for cleaning web pages with PHP and JavaScript (including jQuery), which will extract specific data from each page of our customers' websites. A scraping application should receive two types of data for each page: 1) determine whether there are certain HTML elements with specific identifiers, and 2) extract the value of a specific JavaScript variable. The JS variable name is the same on every page, but the value is usually different.
I believe that I know how I can get the first data requirement: using the PHP file_get_contents () function to get each HTML page, and then use JavaScript / jQuery to parse this HTML and find elements with specific identifiers. However, I'm not sure how to get the second piece of data - the values of the JavaScript variable. The JavaScript variable is not found even in the HTML of each page; instead, it is in the external JavaScript file that is associated with the page. And even if JavaScript was embedded in HTML pages, I know that file_get_contents () will only retrieve JavaScript code (and other HTML), and not any variable values.
Can someone suggest a good approach to get this variable value for each page of this website?
EDIT: just to clarify, I need the values of the JavaScript variables after running the JavaScript code. Is such a thing possible?
jake source share