Retrieving financial data from Google Finance beyond API

The Google Finance API is incomplete - many numbers on the page, such as:

http://www.google.com/finance?fstype=ii&q=NYSE:GE

unavailable through the API.

I need this data to rank companies on Canadian stock exchanges according to the Greenblatt formula, available through a Google search for “greenblatt index scanners”.

My question is: what is the most intelligent / clean / efficient way to access and process data on these web pages. Is a tedious approach really necessary in this case, and if so, what is the best way to do this? I am currently learning Python for projects related to this.

+5
source share
3 answers

You can ask Google to provide the missing APIs. Otherwise, you are stuck in a screenshot of the screen , which is never funny, prone to hacking without warning, and probably in violation of Google’s terms of service .

But, if you still want to write a screen scraper, it is difficult to perform a combination of mechanize and BeautifulSoup . BeautifulSoup is an HTML parser and mechanization is a Python-based web browser that allows you to log in, store cookies and, as a rule, navigate like any other web browser.

+4
source

BeautifulSoup would be the preferred method of parsing HTML with Python

Google (, Yahoo Finance API)?

+3

Scrambling web pages always sucks, but I would recommend converting them to xml (via a neat or some other HTML-> XML program) and then using xpath to go around the sites you are interested in.

0
source

All Articles