Retrieving financial data from Google Finance beyond API

Question

Retrieving financial data from Google Finance beyond API

The Google Finance API is incomplete - many numbers on the page, such as:

http://www.google.com/finance?fstype=ii&q=NYSE:GE

unavailable through the API.

I need this data to rank companies on Canadian stock exchanges according to the Greenblatt formula, available through a Google search for “greenblatt index scanners”.

My question is: what is the most intelligent / clean / efficient way to access and process data on these web pages. Is a tedious approach really necessary in this case, and if so, what is the best way to do this? I am currently learning Python for projects related to this.

+5

python api data-mining google-finance

Marco Jun 17 '09 at 21:07

source share

3 answers

BeautifulSoup would be the preferred method of parsing HTML with Python

Google (, Yahoo Finance API)?

+3

Eli 17 . '09 21:42

Scrambling web pages always sucks, but I would recommend converting them to xml (via a neat or some other HTML-> XML program) and then using xpath to go around the sites you are interested in.

0

Paul tarjan Jun 17 '09 at 21:20

source share

Ryan bright · Accepted Answer · 2009-06-17T23:55:59+0000

You can ask Google to provide the missing APIs. Otherwise, you are stuck in a screenshot of the screen , which is never funny, prone to hacking without warning, and probably in violation of Google’s terms of service .

But, if you still want to write a screen scraper, it is difficult to perform a combination of mechanize and BeautifulSoup . BeautifulSoup is an HTML parser and mechanization is a Python-based web browser that allows you to log in, store cookies and, as a rule, navigate like any other web browser.

Retrieving financial data from Google Finance beyond API

More articles: