Web scraper - how to access content presented in JavaScript via Angular.js?

Question

Web scraper - how to access content presented in JavaScript via Angular.js?

I am trying to clear data from the public website asx.com.au

The page http://www.asx.com.au/asx/research/company.do#!/ACB/details contains a div with the class 'view-content', which has the information I need:

But when I try to view this page through Python urllib2.urlopen , that div is empty:

 import urllib2 from bs4 import BeautifulSoup url = 'http://www.asx.com.au/asx/research/company.do#!/ACB/details' page = urllib2.urlopen(url).read() soup = BeautifulSoup(page, "html.parser") contentDiv = soup.find("div", {"class": "view-content"}) print(contentDiv) # the results is an empty div: # <div class="view-content" ui-view=""></div>

Is it possible to access the contents of this div programmatically?

Edit: according to the comment, it seems that the content is being rendered through Angular.js . Is it possible to initiate the rendering of this content through Python?

+6

python angularjs web-scraping urllib2 beautifulsoup

Stephen lead Jan 28 '16 at 0:20

source share

1 answer

furas · Accepted Answer · 2016-01-28T00:38:09+0000

This page uses JavaScript to read data from the server and fill page.

I see that you use the developer tools in chrome - see the "Network" tab in the "XHR" or "JS" requests.

I found this url

http://data.asx.com.au/data/1/company/ACB?fields=primary_share,latest_annual_reports,last_dividend,primary_share.indices&callback=angular.callbacks._0

This url gives all the data almost in JSON format

But if you use this link without &callback=angular.callbacks._0 , then you will get the data in pure JSON format, and you can use the json module to convert it to a python dictionary.

EDIT: working code

 import urllib2 from bs4 import BeautifulSoup import json # new url url = 'http://data.asx.com.au/data/1/company/ACB?fields=primary_share,latest_annual_reports,last_dividend,primary_share.indices' # read all data page = urllib2.urlopen(url).read() # convert json text to python dictionary data = json.loads(page) print(data['principal_activities'])

Output:

 Mineral exploration in Botswana, China and Australia.

Web scraper - how to access content presented in JavaScript via Angular.js?

More articles: