Web scraping a web page with dynamic content loaded via ajax

Say that I want to clean the products on this page ( http://shop.coles.com.au/online/national/bread-bakery/fresh/bread#pageNumber=2¤tPageSize=20 ) p>

But the products are loaded from the post request. Many messages here suggest simulating a request for dynamic content, but in my case, Form Data is unknown to me, i.e. catalogId , categoryId .

I am wondering if it is possible to get response after the completion of the ajax call?

+5
source share
1 answer

You can get catalogId and other parameter values ​​needed for a POST request from form using id="search" :

 <form id="search" name="search" action="http://shop.coles.com.au/online/SearchDisplay?pageView=image&amp;catalogId=10576&amp;beginIndex=0&amp;langId=-1&amp;storeId=10601" method="get" role="search"> <input type="hidden" name="storeId" value="10601" id="WC_CachedHeaderDisplay_FormInput_storeId_In_CatalogSearchForm_1"> <input type="hidden" name="catalogId" value="10576" id="WC_CachedHeaderDisplay_FormInput_catalogId_In_CatalogSearchForm_1"> <input type="hidden" name="langId" value="-1" id="WC_CachedHeaderDisplay_FormInput_langId_In_CatalogSearchForm_1"> <input type="hidden" name="beginIndex" value="0" id="WC_CachedHeaderDisplay_FormInput_beginIndex_In_CatalogSearchForm_1"> <input type="hidden" name="browseView" value="false" id="WC_CachedHeaderDisplay_FormInput_browseView_In_CatalogSearchForm_1"> <input type="hidden" name="searchSource" value="Q" id="WC_CachedHeaderDisplay_FormInput_searchSource_In_CatalogSearchForm_1"> ... </form> 

Use FormRequest to submit this form.


I am wondering if it is possible to get a response after completing an ajax call?

Scrapy is not a browser - it does not make additional AJAX requests to load the page, and there is no built-in JavaScript to execute. You can study the real browser and solve it at a higher level - take a look at the selenium package . There is also a scrapy-splash project.

See also:

+2
source

All Articles